In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual information between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correlation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original value for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each member in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this case, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find these two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Markov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we have built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In the current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of the information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be applied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the direction of improvements.
This document summarizes a research paper on analyzing sentiments from Twitter data using data mining techniques. The paper presents an approach for analyzing user sentiments using data mining classifiers and compares the performance of single classifiers versus an ensemble of classifiers for sentiment analysis. Experimental results show that the k-nearest neighbor classifier achieved very high predictive accuracy, and single classifiers outperformed the ensemble approach.
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
This document presents a hybrid approach for sentiment analysis that combines a lexicon-based technique and a machine learning technique using recurrent neural networks. It aims to analyze sentiments expressed in tweets towards products and services more accurately. The proposed model first cleans tweets collected from Twitter APIs. It then classifies the tweets' sentiment using both a lexicon-based technique using TextBlob and an LSTM-RNN model. The hybrid approach provides not only classification of sentiment but also a score of sentiment strength. This combined approach seeks to gain deeper insights than single techniques alone.
IRJET - Implementation of Twitter Sentimental Analysis According to Hash TagIRJET Journal
This document proposes a model for analyzing sentiment from tweets using hashtags. It involves collecting tweets, preprocessing the data by removing URLs and stopwords, training a classifier using Naive Bayes, and classifying tweets as positive, negative, or neutral. Hashtags are also classified to help organize tweets by topic. The proposed system is intended to help large companies understand public sentiment about their brands by analyzing tweets in real-time.
Using Networks to Measure Influence and ImpactYunhao Zhang
1. The document describes a network analysis model to measure influence in networks. It builds a co-authorship network of researchers who collaborated with mathematician Paul Erdos and analyzes its properties.
2. It proposes an Entropy-Weight-Based Gray Relational Analysis (EWGRA) model to quantitatively analyze correlations between subjects while avoiding human interference in weighting. This model combines centrality measures and PageRank to measure the most influential researcher.
3. The model also proposes a Food Chain Model to measure the significance of research papers by simulating how information spreads. By applying these models to different networks, the most influential researcher and research paper are identified.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
This document presents research on author profiling of tweets using neural networks. The researchers used convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to classify tweets by gender and age based on lexical and syntactic features. Their best models achieved accuracies of 74.55% for gender classification using CNNs and 73.32% for age classification using RNNs. While support vector machines performed well, feature engineering did not generalize well to tweets. Neural networks were able to learn meaningful patterns from limited Twitter data without extensive feature engineering.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
This document summarizes a research paper on analyzing sentiments from Twitter data using data mining techniques. The paper presents an approach for analyzing user sentiments using data mining classifiers and compares the performance of single classifiers versus an ensemble of classifiers for sentiment analysis. Experimental results show that the k-nearest neighbor classifier achieved very high predictive accuracy, and single classifiers outperformed the ensemble approach.
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
This document presents a hybrid approach for sentiment analysis that combines a lexicon-based technique and a machine learning technique using recurrent neural networks. It aims to analyze sentiments expressed in tweets towards products and services more accurately. The proposed model first cleans tweets collected from Twitter APIs. It then classifies the tweets' sentiment using both a lexicon-based technique using TextBlob and an LSTM-RNN model. The hybrid approach provides not only classification of sentiment but also a score of sentiment strength. This combined approach seeks to gain deeper insights than single techniques alone.
IRJET - Implementation of Twitter Sentimental Analysis According to Hash TagIRJET Journal
This document proposes a model for analyzing sentiment from tweets using hashtags. It involves collecting tweets, preprocessing the data by removing URLs and stopwords, training a classifier using Naive Bayes, and classifying tweets as positive, negative, or neutral. Hashtags are also classified to help organize tweets by topic. The proposed system is intended to help large companies understand public sentiment about their brands by analyzing tweets in real-time.
Using Networks to Measure Influence and ImpactYunhao Zhang
1. The document describes a network analysis model to measure influence in networks. It builds a co-authorship network of researchers who collaborated with mathematician Paul Erdos and analyzes its properties.
2. It proposes an Entropy-Weight-Based Gray Relational Analysis (EWGRA) model to quantitatively analyze correlations between subjects while avoiding human interference in weighting. This model combines centrality measures and PageRank to measure the most influential researcher.
3. The model also proposes a Food Chain Model to measure the significance of research papers by simulating how information spreads. By applying these models to different networks, the most influential researcher and research paper are identified.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
This document presents research on author profiling of tweets using neural networks. The researchers used convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to classify tweets by gender and age based on lexical and syntactic features. Their best models achieved accuracies of 74.55% for gender classification using CNNs and 73.32% for age classification using RNNs. While support vector machines performed well, feature engineering did not generalize well to tweets. Neural networks were able to learn meaningful patterns from limited Twitter data without extensive feature engineering.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
Information and communication technology grows so fast nowadays, especially related to the internet. Twitter is one of internet applications that produce a large amount of textual data called tweets. The tweets may represent real-world situation discussed in a community. Therefore, Twitter can be an important media for urban monitoring. The ability to monitor the situations may guide local government to respond quickly or make public policy. Topic detection is an important automatic tool to understand the tweets, for example, using non-negative matrix factorization. In this paper, we conducted a study to implement Twitter as a media for the urban monitoring in Jakarta and its surrounding areas called Greater Jakarta. Firstly, we analyze the accuracy of the detected topics in term of their interpretability level. Next, we visualize the trend of the topics to identify popular topics easily. Our simulations show that the topic detection methods can extract topics in a certain level of accuracy and draw the trends such that the topic monitoring can be conducted easily.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONadeij1
The document presents the results of a study that compares several natural language processing (NLP) techniques for sentiment analysis: distilBERT, VADER, and LSTM. DistilBERT achieved the highest accuracy of 92.4% for predicting sentiment polarity in a restaurant review dataset, significantly outperforming VADER (72.3% accuracy) and conventional LSTM approaches (78% accuracy). The document describes the methodology used in the comparative study and provides details on how each NLP technique approaches the task of sentiment analysis.
The document presents a study that analyzes sentiment on Twitter using various classification algorithms. It compares the performance of Naive Bayes, Bayes Net, Discriminative Multinomial Naive Bayes, Sequential Minimal Optimization, Hyperpipes, and Random Forest algorithms on a Twitter sentiment dataset. The study finds that Discriminative Multinomial Naive Bayes and Sequential Minimal Optimization algorithms have the best performance with overall F-scores of 0.769 and 0.75, respectively. The study aims to determine the most accurate and efficient algorithms for Twitter sentiment classification.
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Misconduct disclosure of the intermediates using the trusted authorityeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses using complexity analysis as an advanced method for data mining. It involves 3 steps: 1) Measuring the information content in scatter plots using image entropy. 2) Identifying relationships between variables using mutual information. 3) Repeating steps 1 and 2 for all variable pairs to map dependencies and identify key interrelated and isolated variables. Complexity is defined by the number and nature of relationships, providing insights into system controllability, predictability, and distance from critical complexity. Complexity analysis can extract more useful insights than simple correlations when data is complex, turbulent or chaotic.
News Reliability Evaluation using Latent Semantic AnalysisTELKOMNIKA JOURNAL
The rapid rise and widespread of ‘Fake News’ has severe implications in the society today. Much efforts have been directed towards the development of methods to verify news reliability on the Internet in recent years. In this paper, an automated news reliability evaluation system was proposed. The system utilizes term several Natural Language Processing (NLP) techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), Phrase Detection and Cosine Similarity in tandem with Latent Semantic Analysis (LSA). A collection of 9203 labelled articles from both reliable and unreliable sources were collected. This dataset was then applied random test-train split to create the training dataset and testing dataset. The final results obtained shows 81.87% for precision and 86.95% for recall with the accuracy being 73.33%.
This document summarizes research on sentiment analysis of Twitter data. It discusses how sentiment analysis can classify tweets as positive, negative, or neutral. It reviews different techniques for sentiment analysis, including machine learning approaches like Naive Bayes classifiers and lexicon-based approaches. The document also describes prior studies that have used sentiment analysis techniques to predict security attacks based on Twitter sentiment and explore improvements in classification accuracy. In general, the document outlines common methods for analyzing sentiment in social media data and highlights past applications of the analysis.
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
Nearly 70% of people are concerned about the propagation of fake news. This paper aims to detect fake news in online articles through the use of semantic features and various machine learning techniques. In this research, we investigated recurrent neural networks vs. the naive bayes classifier and random forest classifiers using five groups of linguistic features. Evaluated with real or fake dataset from kaggle.com, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier. The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Exploring the Current Trends and Future Prospects in Terrorist Network Mining cscpconf
In today’s era of hi-tech technologies, criminals are easily fulfilling their inhuman goals against
the mankind. Thus, the security of civilians has significantly become important. In this regard,
the law-enforcement agencies are aiming to prevent future attacks. To do so, the terrorist
networks are being analyzed using data mining techniques. One such technique is Social network analysis which studies terrorist networks for the identification of relationships and associations that may exist between terrorist nodes. Terrorist activities can also be detected by means of analyzing Web traffic content. This paper studies social network analysis, web traffic content and explores various ways for identifying terrorist activities.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
The document summarizes a student project to build a question answering search engine called "Ask Yahoo!" that retrieves answers directly from the Yahoo Answers database. The project involved:
1. Cleaning and indexing over 4 million Yahoo Answers questions and answers to create searchable indexes.
2. Developing a search engine frontend where users can enter questions and receive top matching answers below the search box.
3. Implementing a ranking system using TF-IDF that scores documents based on user query terms.
4. Evaluating performance on a test set where friends rewrote questions to check if the top answer matched the original question. The system achieved a score of 0.7 for correctly answering questions after query
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
This document presents a probability model for detecting evolving topics from social media streams. It focuses on the social aspects of posts reflected in user mentioning behavior, rather than textual content. The model captures the number of mentions per post and frequency of mentioned users. It analyzes individual posting anomalies and aggregates anomaly scores. SDNML change point analysis and burst detection are then used to identify evolving topics from the aggregated scores. Experimental results show that link-based detection using this approach performs better than key-based detection using textual content alone. The model overcomes limitations of prior frequency-based approaches and can detect topics from both textual and non-textual social media posts.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Politycheck - Political Ideology Detection (Natural Language Processing Chall...Hamza Mahmood
This document describes a natural language processing challenge to detect the political ideology of political leaders. It involves using neural network techniques and keywords to compute a score that determines if a political figure has a leftist or right stance. The goal is to identify the stance of political leaders on specific issues, gauge sentiment expressed during presidential campaigns, and make leaders' liberal or conservative views transparent. A LSTM neural network model is created using Keras that involves embedding, LSTM, and dense layers. The model is trained on congressional data labeled with ideology scores and tested on Twitter data. Experiments show the LSTM model improves over bag-of-words and RNN models according to mean squared error evaluations. Sentiment analysis of tweets for Trump and Hillary Clinton found scores consistent with
Unsupervised Learning of a Social Network from a Multiple-Source News Corpushtanev
This document presents an unsupervised methodology for automatically learning social networks from multiple news sources. It uses a small number of seed syntactic templates to identify relationships between entities. These templates are then used to learn new syntactic patterns from news clusters expressing the same relationships. An efficient graph matching algorithm is then used to extract related entities and build the social network. The method achieves a precision of 0.61 and recall of 0.56 for identifying meeting relationships, and 0.57 precision and 0.10 recall for identifying support relationships.
Risk Prediction for Production of an EnterpriseEditor IJCATR
Despite all preventive measures, there is so much possibility of risks in any project development as well as in enterprise management. There is no any standard mechanism or methodology available to assess the risks in any project or production management. Using some precautionary steps, the manager can only avoid the risks as much as he can. To address this issue, this paper presents a probabilistic risk assessment model for the production of an enterprise. For this, Multi-Entity Bayesian Network (MEBN) has been used to represent the requirements for production management as well as to assess the risks adherence in production management, where MEBN combines expressivity of first-order logic and probabilistic feature of Bayesian network. Bayesian network provides the feature to represent the probabilistic uncertainty and reasoning about probabilistic knowledge base, which is used here to represent the probable risks behind each causes of a risk. The proposed probabilistic model is discussed with the help of a case study, which is used to predict risks inherent in the production of an enterprise, which depends upon various measures like labour availability, power backup, transport availability etc.
In this paper, we introduce the notion of intuitionistic fuzzy semipre generalized connected space, intuitionistic fuzzy semipre generalized super connected space and intuitionistic fuzzy semipre generalized extremally disconnected spaces. We investigate some of their properties.
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
Information and communication technology grows so fast nowadays, especially related to the internet. Twitter is one of internet applications that produce a large amount of textual data called tweets. The tweets may represent real-world situation discussed in a community. Therefore, Twitter can be an important media for urban monitoring. The ability to monitor the situations may guide local government to respond quickly or make public policy. Topic detection is an important automatic tool to understand the tweets, for example, using non-negative matrix factorization. In this paper, we conducted a study to implement Twitter as a media for the urban monitoring in Jakarta and its surrounding areas called Greater Jakarta. Firstly, we analyze the accuracy of the detected topics in term of their interpretability level. Next, we visualize the trend of the topics to identify popular topics easily. Our simulations show that the topic detection methods can extract topics in a certain level of accuracy and draw the trends such that the topic monitoring can be conducted easily.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONadeij1
The document presents the results of a study that compares several natural language processing (NLP) techniques for sentiment analysis: distilBERT, VADER, and LSTM. DistilBERT achieved the highest accuracy of 92.4% for predicting sentiment polarity in a restaurant review dataset, significantly outperforming VADER (72.3% accuracy) and conventional LSTM approaches (78% accuracy). The document describes the methodology used in the comparative study and provides details on how each NLP technique approaches the task of sentiment analysis.
The document presents a study that analyzes sentiment on Twitter using various classification algorithms. It compares the performance of Naive Bayes, Bayes Net, Discriminative Multinomial Naive Bayes, Sequential Minimal Optimization, Hyperpipes, and Random Forest algorithms on a Twitter sentiment dataset. The study finds that Discriminative Multinomial Naive Bayes and Sequential Minimal Optimization algorithms have the best performance with overall F-scores of 0.769 and 0.75, respectively. The study aims to determine the most accurate and efficient algorithms for Twitter sentiment classification.
Syntactic search relies on keywords contained in a query to find suitable documents. So, documents that do
not contain the keywords but contain information related to the query are not retrieved. Spreading
activation is an algorithm for finding latent information in a query by exploiting relations between nodes in
an associative network or semantic network. However, the classical spreading activation algorithm uses all
relations of a node in the network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called query-oriented-constrained spreading activation
that only uses relations relating to the content of the query to find really related information. Experiments
on a benchmark dataset show that, in terms of the MAP measure, our search engine is 18.9% and 43.8%
respectively better than the syntactic search and the search using the classical constrained spreading
activation.
Misconduct disclosure of the intermediates using the trusted authorityeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document discusses using complexity analysis as an advanced method for data mining. It involves 3 steps: 1) Measuring the information content in scatter plots using image entropy. 2) Identifying relationships between variables using mutual information. 3) Repeating steps 1 and 2 for all variable pairs to map dependencies and identify key interrelated and isolated variables. Complexity is defined by the number and nature of relationships, providing insights into system controllability, predictability, and distance from critical complexity. Complexity analysis can extract more useful insights than simple correlations when data is complex, turbulent or chaotic.
News Reliability Evaluation using Latent Semantic AnalysisTELKOMNIKA JOURNAL
The rapid rise and widespread of ‘Fake News’ has severe implications in the society today. Much efforts have been directed towards the development of methods to verify news reliability on the Internet in recent years. In this paper, an automated news reliability evaluation system was proposed. The system utilizes term several Natural Language Processing (NLP) techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), Phrase Detection and Cosine Similarity in tandem with Latent Semantic Analysis (LSA). A collection of 9203 labelled articles from both reliable and unreliable sources were collected. This dataset was then applied random test-train split to create the training dataset and testing dataset. The final results obtained shows 81.87% for precision and 86.95% for recall with the accuracy being 73.33%.
This document summarizes research on sentiment analysis of Twitter data. It discusses how sentiment analysis can classify tweets as positive, negative, or neutral. It reviews different techniques for sentiment analysis, including machine learning approaches like Naive Bayes classifiers and lexicon-based approaches. The document also describes prior studies that have used sentiment analysis techniques to predict security attacks based on Twitter sentiment and explore improvements in classification accuracy. In general, the document outlines common methods for analyzing sentiment in social media data and highlights past applications of the analysis.
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
Nearly 70% of people are concerned about the propagation of fake news. This paper aims to detect fake news in online articles through the use of semantic features and various machine learning techniques. In this research, we investigated recurrent neural networks vs. the naive bayes classifier and random forest classifiers using five groups of linguistic features. Evaluated with real or fake dataset from kaggle.com, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier. The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Exploring the Current Trends and Future Prospects in Terrorist Network Mining cscpconf
In today’s era of hi-tech technologies, criminals are easily fulfilling their inhuman goals against
the mankind. Thus, the security of civilians has significantly become important. In this regard,
the law-enforcement agencies are aiming to prevent future attacks. To do so, the terrorist
networks are being analyzed using data mining techniques. One such technique is Social network analysis which studies terrorist networks for the identification of relationships and associations that may exist between terrorist nodes. Terrorist activities can also be detected by means of analyzing Web traffic content. This paper studies social network analysis, web traffic content and explores various ways for identifying terrorist activities.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
The document summarizes a student project to build a question answering search engine called "Ask Yahoo!" that retrieves answers directly from the Yahoo Answers database. The project involved:
1. Cleaning and indexing over 4 million Yahoo Answers questions and answers to create searchable indexes.
2. Developing a search engine frontend where users can enter questions and receive top matching answers below the search box.
3. Implementing a ranking system using TF-IDF that scores documents based on user query terms.
4. Evaluating performance on a test set where friends rewrote questions to check if the top answer matched the original question. The system achieved a score of 0.7 for correctly answering questions after query
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
This document presents a probability model for detecting evolving topics from social media streams. It focuses on the social aspects of posts reflected in user mentioning behavior, rather than textual content. The model captures the number of mentions per post and frequency of mentioned users. It analyzes individual posting anomalies and aggregates anomaly scores. SDNML change point analysis and burst detection are then used to identify evolving topics from the aggregated scores. Experimental results show that link-based detection using this approach performs better than key-based detection using textual content alone. The model overcomes limitations of prior frequency-based approaches and can detect topics from both textual and non-textual social media posts.
Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of
the text is quite high as a number of reviews obtained from the Internet sources like Twitter are huge in number. These reviews or opinions on popular products or events help in determining the public opinion towards the issue. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class, namely positive and negative. Histograms of every test elements are then classified using k-NN, Bayesian Classifier and LSTM network. This work is then implemented in Android integrated with Twitter. The user will have to provide the topic for analysis. The Application will show the result as the percentage of positive review tweets in favor of the topic using Bayesian Classifier.
Politycheck - Political Ideology Detection (Natural Language Processing Chall...Hamza Mahmood
This document describes a natural language processing challenge to detect the political ideology of political leaders. It involves using neural network techniques and keywords to compute a score that determines if a political figure has a leftist or right stance. The goal is to identify the stance of political leaders on specific issues, gauge sentiment expressed during presidential campaigns, and make leaders' liberal or conservative views transparent. A LSTM neural network model is created using Keras that involves embedding, LSTM, and dense layers. The model is trained on congressional data labeled with ideology scores and tested on Twitter data. Experiments show the LSTM model improves over bag-of-words and RNN models according to mean squared error evaluations. Sentiment analysis of tweets for Trump and Hillary Clinton found scores consistent with
Unsupervised Learning of a Social Network from a Multiple-Source News Corpushtanev
This document presents an unsupervised methodology for automatically learning social networks from multiple news sources. It uses a small number of seed syntactic templates to identify relationships between entities. These templates are then used to learn new syntactic patterns from news clusters expressing the same relationships. An efficient graph matching algorithm is then used to extract related entities and build the social network. The method achieves a precision of 0.61 and recall of 0.56 for identifying meeting relationships, and 0.57 precision and 0.10 recall for identifying support relationships.
Risk Prediction for Production of an EnterpriseEditor IJCATR
Despite all preventive measures, there is so much possibility of risks in any project development as well as in enterprise management. There is no any standard mechanism or methodology available to assess the risks in any project or production management. Using some precautionary steps, the manager can only avoid the risks as much as he can. To address this issue, this paper presents a probabilistic risk assessment model for the production of an enterprise. For this, Multi-Entity Bayesian Network (MEBN) has been used to represent the requirements for production management as well as to assess the risks adherence in production management, where MEBN combines expressivity of first-order logic and probabilistic feature of Bayesian network. Bayesian network provides the feature to represent the probabilistic uncertainty and reasoning about probabilistic knowledge base, which is used here to represent the probable risks behind each causes of a risk. The proposed probabilistic model is discussed with the help of a case study, which is used to predict risks inherent in the production of an enterprise, which depends upon various measures like labour availability, power backup, transport availability etc.
In this paper, we introduce the notion of intuitionistic fuzzy semipre generalized connected space, intuitionistic fuzzy semipre generalized super connected space and intuitionistic fuzzy semipre generalized extremally disconnected spaces. We investigate some of their properties.
One of the most popular areas of research is wireless communication. Mobile Ad Hoc network (MANET) is a network with wireless mobile nodes, infrastructure less and self organizing. With its wireless and distributed nature it is exposed to several security threats. One of the threats in MANET is the wormhole attack. In this attack a pair of attacker forms a virtual link thereby recording and replaying the wireless transmission. This paper presents types of wormhole attack and also includes different technique for detecting wormhole attack in MANET..
Online Social Network (OSN) sites act as a medium to spread their own views, activities and their thoughts to some camaraderie. Contents of this network are spread over web, so it was hard to determine by a human decision. Currently, they do not provide any mechanism to ensure privacy concerns towards data associated with each user. Due to this problem, number of users lacks from their ownership control. In this paper, we proposed AC2P (Activity Control-Access Control Protocol) for information control on the web. Alternatively, Tag Refinement strategy determines illegal tagging over images and send notification about particular image spread within different communities/groups. These techniques reduce risk of information flow and avoid unwanted tagging toward images.
Multi-Agent systems (Autonomous agents or agents) and knowledge discovery (or data mining) are two active
areas in information technology. A profound insight of bringing these two communities together has unveiled a tremendous
potential for new opportunities and wider applications through the synergy of agents and data mining. Multi-agent systems
(MAS) often deal with complex applications that require distributed problem solving. In many applications the individual and
collective behavior of the agents depends on the observed data from distributed data sources. Data mining technology has
emerged, for identifying patterns and trends from large quantities of data. The increasing demand to scale up to massive data sets
inherently distributed over a network with limited band width and computational resources available motivated the development of
distributed data mining (DDM).Distributed data mining is originated from the need of mining over decentralized data
sources. DDM is expected to perform partial analysis of data at individual sites and then to send the outcome as partial result
to other sites where it sometimes required to be aggregated to the global result
Security for Effective Data Storage in Multi CloudsEditor IJCATR
Cloud Computing is a technology that uses the internet and central remote servers to maintain data and
applications. Cloud computing allows consumers and businesses to use applications without installation and access their personal
files at any computer with internet access. This technology allows for much more efficient computing by centralizing data
storage, processing and bandwidth. The use of cloud computing has increased rapidly in many organizations. Cloud computing
provides many benefits in terms of low cost and accessibility of data. Ensuring the security of cloud computing is a major factor
in the cloud computing environment, as users often store sensitive information with cloud storage providers but these providers
may be untrusted. Dealing with “single cloud” providers is predicted to become less popular with customers due to risks of
service availability failure and the possibility of malicious insiders in the single cloud. A movement towards “multi-clouds”, or in
other words, “interclouds” or “cloud-of clouds” has emerged recently. This paper surveys recent research related to single and
multi-cloud security and addresses possible solutions. It is found that the research into the use of multicloud providers to maintain
security has received less attention from the research community than has the use of single clouds. This work aims to promote the
use of multi-clouds due to its ability to reduce security risks that affect the cloud computing user.
This document describes a study that developed an Android application to predict and suggest measures for diabetes using data mining techniques. The study used the Pima Indian diabetes dataset to build a decision tree classification model using the C4.5 algorithm to predict whether a person is diabetic or not based on their attributes. The most significant attributes identified for prediction were plasma glucose level, body mass index, diabetes pedigree function, and insulin level. The developed Android app allows a user to enter their details, which are then run through the decision tree model to predict their diabetes status and provide suggested measures if predicted to be diabetic. The goal of the study was to create a mobile application to help individuals assess their risk of diabetes and maintain healthy habits.
Wireless data broadcast is an efficient way of disseminating data to users in the mobile computing environments. From the server’s point of view, how to place the data items on channels is a crucial issue, with the objective of minimizing the average access time and tuning time. Similarly, how to schedule the data retrieval process for a given request at the client side such that all the requested items can be downloaded in a short time is also an important problem. In this paper, we investigate the multi-item data retrieval scheduling in the push-based multichannel broadcast environments. The most important issues in mobile computing are energy efficiency and query response efficiency. However, in data broadcast the objectives of reducing access latency and energy cost can be contradictive to each other. Consequently, we define a new problem named Minimum Cost Data Retrieval Problem (MCDR) and Large Number Data Retrieval (LNDR) Problem. We also develop a heuristic algorithm to download a large number of items efficiently. When there is no replicated item in a broadcast cycle, we show that an optimal retrieval schedule can be obtained in polynomial time
Random Valued Impulse Noise Elimination using Neural FilterEditor IJCATR
A neural filtering technique is proposed in this paper for restoring the images extremely corrupted with random valued impulse noise. The proposed intelligent filter is carried out in two stages. In first stage the corrupted image is filtered by applying an asymmetric trimmed median filter. An asymmetric trimmed median filtered output image is suitably combined with a feed forward neural network in the second stage. The internal parameters of the feed forward neural network are adaptively optimized by training of three well known images. This is quite effective in eliminating random valued impulse noise. Simulation results show that the proposed filter is superior in terms of eliminating impulse noise as well as preserving edges and fine details of digital images and results are compared with other existing nonlinear filters.
Image enhancement plays an important role in vision applications. Recently a lot of work has been performed in the field of image enhancement. Many techniques have already been proposed till now for enhancing the digital images. This paper has presented a comparative analysis of various image enhancement techniques. This paper has shown that the fuzzy logic and histogram based techniques have quite effective results over the available techniques. This paper ends up with suitable future directions to enhance fuzzy based image enhancement technique further. In the proposed technique, an approach is made to enhance the images other than low-contrast images as well by balancing the stretching parameter (K) according to the color contrast. Proposed technique is designed to restore the degraded edges resulted due to contrast enhancement as well.
Grid computing is a modularized way of structuring network resources so as to share the information and resources, to perform heavily intense problems. Grid computing is a part of distributed processing which allows the distribution of the problem over multiple computational resources. The formation of grid decides the computation cost, the performance metrics of the operation to be carried out. The grid computing favours the formation of an adhoc structure formed at the time of request (it does not hold an infrastructure) that is termed as virtual organization. This paper presents dynamic source routing for modeling virtual organization.
Assistive Examination System for Visually ImpairedEditor IJCATR
This paper presents a design of voice enabled examination system which can be used by the visually challenged students.
The system uses Text-to-Speech (TTS) and Speech-to-Text (STT) technology. The text-to-speech and speech-to-text web based
academic testing software would provide an interaction for blind students to enhance their educational experiences by providing them
with a tool to give the exams. This system will aid the differently-abled to appear for online tests and enable them to come at par with
the other students. This system can also be used by students with learning disabilities or by people who wish to take the examination in
a combined auditory and visual way.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
This document proposes an adaptive steganography technique for hiding secret data in digital images. The technique uses variable bit length embedding based on wavelet coefficients of the cover image. A logistic map is used to generate a secret key, which determines the RGB color planes and blocks used for data hiding. Wavelet coefficients are classified into ranges, and the number of bits hidden corresponds to the coefficient value range. Extraction involves selecting the same coefficients based on the key and calculating the hidden bits. The technique aims to improve security, capacity and imperceptibility compared to existing constant bit length methods. Evaluation shows the proposed method provides good security since variable bits are hidden randomly using the secret key.
Risk-Aware Response Mechanism with Extended D-S theoryEditor IJCATR
Mobile Ad hoc Networks (MANET) are having dynamic nature of its network infrastructure and
it is vulnerable to all types of attacks. Among these attacks, the routing attacks getting more attention
because its changing the whole topology itself and it causes more damage to MANET. Even there are lot of
intrusion detection Systems available to diminish those critical attacks, existing causesunexpected network
partition, and causes additional damages to the infrastructure of the network , and it leads to uncertainty in
finding routing attacks in MANET. In this paper, we propose a adaptive risk-aware response mechanism with
extended Dempster-Shafer theory in MANET to identify the routing attacks and malicious node. Our
techniques find the malicious node with degree of evidence from the expert knowledge and detect the
important factors for each node.It creates black list and all those malicious nodes so that it may not enter the
network again
H IDDEN M ARKOV M ODEL A PPROACH T OWARDS E MOTION D ETECTION F ROM S PEECH S...csandit
Emotions carry the token indicating a human’s menta
l state. Understanding the emotion
exhibited becomes difficult for people suffering fr
om autism and alexithymia. Assessment of
emotions can also be beneficial in interactions inv
olving a human and a machine. A system is
developed to recognize the universally accepted emo
tions such as happy, anger, sad, disgust,
fear and surprise. The gender of the speaker helps
to obtain better clarity for identifying the
emotion. Hidden Markov Model serves the purpose of
gender identification
This document discusses 10 examples of using network analysis techniques in various domains:
1. Using social network analysis to map the workforce and labor supply as a complex system.
2. Analyzing social network and interest graph data to power future shopping through identifying customer segments and influencers.
3. Using social network diagrams by drug marketers to locate influential doctors by identifying prescribing patterns and relationships between doctors.
This document presents a new model called EQUIRS (Explicitly Query Understanding Information Retrieval System) based on Hidden Markov Models (HMM) to improve natural language processing for text query information retrieval. The proposed EQUIRS system is compared to previous fuzzy clustering methods. Experimental results on a dataset of 900 files across 5 categories show that EQUIRS has higher accuracy than fuzzy clustering, as measured by precision, recall, F-measure, though it has longer training and searching times. The document concludes that EQUIRS is an effective approach for information retrieval based on HMM.
This document proposes a Markov-based forecasting model for predicting human resource internal supply in enterprises. It begins by discussing the importance of human resource forecasting for manpower planning. It then reviews existing forecasting methods and focuses on using a Markov chain approach. The model involves collecting historical personnel change data, calculating transition probabilities between positions, and using the probabilities to predict future staffing levels. The model aims to help enterprises better forecast internal human resource supply to support sustainable development.
San Francisco Crime Analysis Classification Kaggle contestSameer Darekar
This document presents a project to analyze and predict crime in San Francisco using data mining techniques. The objectives are to analyze the spatial and temporal relationships of crime, predict the category of crime in a location based on variables like location and date, and suggest safest paths between places. The authors describe using naive Bayes, decision tree, random forest and support vector machine classifiers on a dataset of over 878,000 crime incidents to classify crimes and identify patterns. Cross-validation is used to evaluate the classifiers on the training data. The results are intended to help the police department understand crime patterns and deploy resources more efficiently.
A SURVEY OF MARKOV CHAIN MODELS IN LINGUISTICS APPLICATIONScsandit
Markov chain theory isan important tool in applied probability that is quite useful in modeling real-world computing applications.For a long time, rresearchers have used Markov chains for data modeling in a wide range of applications that belong to different fields such as computational linguists, image processing, communications,bioinformatics, finance systems, etc. This paper explores the Markov chain theory and its extension hidden Markov models (HMM) in natural language processing (NLP) applications. This paper also presents some aspects related to Markov chains and HMM such as creating transition matrices, calculating data sequence probabilities, and extracting the hidden states.
This document provides instructions for an assignment on data and file structures for the MCS-021 course. It outlines four questions to answer for the assignment, which is worth 100 marks total and 25% of the course grade. Students must answer all four questions, with each question worth 20 marks. The assignment should be submitted by October 15th, 2013 for the July 2013 session or April 15th, 2014 for the January 2014 session. Question 1 asks students to write an algorithm for implementing doubly linked lists.
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGkevig
This document summarizes a research paper that aims to detect fake news articles through machine learning techniques. It compares random forest, naive bayes, and recurrent neural network classifiers using linguistic features including n-grams and word embeddings. The best performing model was random forest with bigram features, achieving 95.66% accuracy on a real or fake news dataset. Bigrams were found to outperform other n-grams as indicators of news authenticity.
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
This document discusses hard computing and soft computing. Hard computing uses deterministic algorithms and mathematical models to produce accurate and predictable results, while soft computing can handle imprecision, uncertainty, and ambiguity. Soft computing techniques include fuzzy logic, neural networks, genetic algorithms, probabilistic reasoning, and evolutionary computation. These techniques aim to mimic human-like reasoning by tolerating uncertainty, learning and adapting, and integrating multiple methods. Examples of evolutionary computation algorithms provided are genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization. Neural networks, ant colony optimization, and fuzzy logic are also summarized.
The document provides information about the syllabus for the Data Analytics (KIT-601) course. It includes 5 units that will be covered: Introduction to Data Analytics, Data Analysis techniques including regression modeling and multivariate analysis, Mining Data Streams, Frequent Itemsets and Clustering, and Frameworks and Visualization. It lists the course outcomes and Bloom's taxonomy levels. It also provides details on the topics to be covered in each unit, including proposed lecture hours, textbooks, and an evaluation scheme. The syllabus aims to discuss concepts of data analytics and apply techniques such as classification, regression, clustering, and frequent pattern mining on data.
Calculation of the Minimum Computational Complexity Based on Information Entropyijcsa
In order to find out the limiting speed of solving a specific problem using computer, this essay provides a method based on information entropy. The relationship between the minimum computational complexity and information entropy change is illustrated. A few examples are served as evidence of such connection. Meanwhile some basic rules of modeling problems are established. Finally, the nature of solving problems with computer programs is disclosed to support this theory and a redefinition of information entropy in this filed is proposed. This will develop a new field of science.
A unified prediction of computer virus spread in connected networksUltraUploader
This document presents two models for predicting the spread of computer viruses in connected networks: a discrete Markov model and a continuous differential equation model. The Markov model captures short-term behavior dependent on initial conditions, providing extinction probabilities. The differential equation model predicts long-term behavior by defining a boundary between parameter values where the virus will survive or die out. Simulations show good agreement between the models. The models provide new insights into controlling computer virus persistence on networks.
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.
The document discusses algorithms for tree data structures. It describes AVL trees as self-balancing binary search trees where the heights of subtrees can differ by at most one. It also describes red-black trees, noting they are similar to AVL trees but use color attributes (red or black) to balance the tree. The key differences are that AVL trees are generally faster for lookup-intensive applications as they are more rigidly balanced, while red-black tree insertions and deletions may require fewer rotations than AVL trees.
A data mining tool for the detection of suicide in social networksYassine Bensaoucha
This document describes a dissertation that developed a program to detect suicidal tendencies in users on Twitter through data mining and text classification techniques. The program first collects and preprocesses tweets, then classifies them using naive Bayes classifiers into three categories: positive, negative, and suicidal. It analyzes the results to determine if a given user has suicidal tendencies based on the percentage of tweets classified in each category. While initial results were promising, future work could compare this approach to other classifiers and potentially combine it with decision tree classification.
Random Keying Technique for Security in Wireless Sensor Networks Based on Mem...ijcsta
The document proposes a random keying technique combined with memetics concepts to provide security in wireless sensor networks. It involves randomly selecting keys from ranges distributed from the base station to cluster heads and nodes. When a node communicates, it selects keys that undergo crossover and mutation to generate header and trailer keys. The receiving node verifies packets by applying the same operations to the header keys and comparing the results to the trailer keys. Simulations showed this technique effectively combats spoofing attacks while being energy efficient compared to cryptographic methods.
An application of artificial intelligent neural network and discriminant anal...Alexander Decker
This document presents a study that compares the predictive abilities of artificial neural networks and linear discriminant analysis for credit scoring. A credit dataset from a Nigerian bank with 200 applicants and 15 variables is used to build both neural network and linear discriminant models. The models are evaluated based on measures like accuracy, Wilks' lambda, and canonical correlation. Key findings are that the neural network model performs slightly better with less misclassification cost. However, variable selection is important for both models' success. Age, length of service, and other borrowing are found to be the most important predictor variables.
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This document discusses green computing practices and sustainable IT services. It provides an overview of factors driving adoption of green computing to reduce costs and environmental impact of data centers, such as rising energy costs and density. Green strategies discussed include improving infrastructure efficiency, power management, thermal management, efficient product design, and virtualization to optimize resource utilization. The document examines how green computing aims to lower costs and environmental footprint, and how sustainable IT services take a broader approach considering economic, environmental and social impacts.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...PIMR BHOPAL
Variable frequency drive .A Variable Frequency Drive (VFD) is an electronic device used to control the speed and torque of an electric motor by varying the frequency and voltage of its power supply. VFDs are widely used in industrial applications for motor control, providing significant energy savings and precise motor operation.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Build the Next Generation of Apps with the Einstein 1 Platform.
Rejoignez Philippe Ozil pour une session de workshops qui vous guidera à travers les détails de la plateforme Einstein 1, l'importance des données pour la création d'applications d'intelligence artificielle et les différents outils et technologies que Salesforce propose pour vous apporter tous les bénéfices de l'IA.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Gas agency management system project report.pdfKamal Acharya
The project entitled "Gas Agency" is done to make the manual process easier by making it a computerized system for billing and maintaining stock. The Gas Agencies get the order request through phone calls or by personal from their customers and deliver the gas cylinders to their address based on their demand and previous delivery date. This process is made computerized and the customer's name, address and stock details are stored in a database. Based on this the billing for a customer is made simple and easier, since a customer order for gas can be accepted only after completing a certain period from the previous delivery. This can be calculated and billed easily through this. There are two types of delivery like domestic purpose use delivery and commercial purpose use delivery. The bill rate and capacity differs for both. This can be easily maintained and charged accordingly.
1. International Journal of Science and Engineering Applications
Volume 4 Issue 3, 2015, ISSN-2319-7560 (Online)
www.ijsea.com 100
Modified Page Rank Model in Investigation of
Criminal Gang
Ting LU
Research Center for Cloud Computing
North China University of Technology,
Beijing 100043, China
Qi Gao
Department of Mathematics,
Shandong Normal University,
Jinan 250002 China.
Xudong Wang
Department of Information System,
Shanghai University,
Shanghai 201800,China
Cong LIU
Department of Computer Science, Shandong
University of Science and Technology,
Qingdao 266590, China
Abstract:In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that
the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual info
rmation between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correl
ation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original v
alue for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each me
mber in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this c
ase, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find thes
e two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking
list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Mar
kov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire
the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those
suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we ha
ve built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In th
e current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is
recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of t
he information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be ap
plied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the dir
ection of improvements.
Keywords: Page Rank ;Markov Chain; Criminal Gangs; Biological Model; Algorithm; Markov Chain
1. INTRODUCTION
1.1 Background
With the advent of the information age, information technology
development has brought new challenges and opportunities to the
public security work. Because of its superiority, people are
paying close attention to how to integrate and reuse to the public
security data resources and how to achieve comprehensive
application of global data, the automatic association of all kinds
of information as well as the automatic mining function of a
variety of cues, which will provide integrated applications in the
information handling. Using the semantic network analysis and
the relevant models plays a major role in improving the
efficiency and accuracy of detection.
1.2 What is “Crime Gangs”?
Crime Gangs is a crime form, referring to the two or more
persons jointly committing one or more crimes. Intelligence
information (such as the collection of the relevant people’s
conversations in this given case) is the technical basis of the
intelligence analysis. The implementation of criminal
intelligence analysis is based on intelligence gathering, mining
and establishing crime patterns, analyzing and judging criminal
clues as well as a variety of relationships. Building an efficient
model to analyze the relevant information will surely provide
some useful clues for the detection of cases and the judgment of
criminal suspects.
1.3 The Mission of Our Team
In this paper, we would like to introduce the theory of
information analysis to solve the problems, which is described
as follows: In the previous case, it is our team’s duty to identify
the guilty parties among the 10 candidates. Similarly, the other
problem is to solve a relatively larger case, where there are
about 83 candidates.
The analysis and process of these problems are divided into
three steps. Firstly, we use the method of mutual information to
score all the topics mentioned. Secondly, we can acquire the
correlation matrix by calculating the score of the associated
topics among the candidates. Thirdly, by solving the Markov
Transfer Matrix, we get the stable point of Markov Transfer
Matrix, which is the final index of 83 candidates. By analyzing
this index, the 83 degree of suspicion can be determined.
2. PROBLEM ANALYSIS
2.1 Topics’ Suspicious Degree Analysis
First, we analyze the suspicious degree of talks between each
other, using the mutual information and entropy in information
theory, defined as M(A,B) = I(A) + I(B) - I(A,B), to obtain the
2. International Journal of Science and Engineering Applications
Volume 4 Issue 3, 2015, ISSN-2319-7560 (Online)
www.ijsea.com 101
scores of the 12 undetermined topics and the scores of
suspicious ones(Obviously, the scores of the suspicious are
supposed to be higher than the others). Further illustration is put
in models.
2.2 The Transmission of Suspicious Information in the
Working Group Analysis
The work group is divided into five parts, namely, the confirmed
suspects, the people with some suspicion, the undetermined
people and the innocent (listed by the degree of suspicion).
Because we refer to Google’s PangeRank algorithm, we need to
obtain the correlation matrix of this group of people first. Then
we can use this algorithm to get the Markov Transfer Matrix and
later get the stable points using the nature of Markov. The
acquisition of the correlation matrix is based on the message
topics among the 83 people. We can use gij to represent the total
scores of all the topics, thus getting an 83*83 correlation matrix.
2.3 The Initial Vector Distribution of the Working Group
Analysis
To solve the Markov stable point, we need to use the initial
vector distribution. We define the scores of the suspicious as α
(The total number of this group of people is n); the scores of the
innocent is 0; the scores of the undecided is β (The total number
of this group of people is m). They satisfy the equation nα + mβ
=1, and then we can decide the values of α and β through the
search of α with step size of 0.1
3. BASIC ASSUMPTIONS OF OUR
MODEL
3.1. All data is true and reliable.
3.2. There is valid evidence to prove the people who are
suspicious.
3.3. There is valid evidence to prove the people who are
innocent.
3.4. The suspicious topics have higher degree of doubt than that
of the undermined ones.
3.5. An objective and subjective attitude is kept in the process of
handling the cases.
3.6. These topics give a valid summary of the messages.
4. MODEL AND SOLUTION
Our whole model is divided into 3 parts, including 3.1, 3.2 and
3.3. In the 3.4 part, we use the whole model to analyze a simple
example. And in 3.5 and 3.6, we solve the real problem
presented in question 1, and its changed version in question 2.
4.1 Mutual Information Algorithm
Let me put a simple example to better illustrate this algorithm.
“What a good weather today! I would like to go fishing.” A
says to B. The words, such as “today”, “weather”, “good”, “I”,
“fishing” involved in this conversation, can be seen as key
words, and we can use 1 2, , , na a a to represent them.
When B says to A:“Yeah, really good weather! I will go
climbing.” Also, the words such as “weather”, “hiking”, “good”,
“I”, can be extracted as key words and we can use
1 2, , , mb b b to represent them. Here, we list the relationship
between the words of these two people.
Table 1: the language relationship table
A
today good fishing weather I
1 1 1 1 1 0 0
B weather I good climbing
0 0 0 1 1 1 1
Notes: “1” in the second row and second column of the table rep
resents the key word “today” said by A, but “0” in the fourth ro
w and second column represents that B didn’t say the key word
“1”indicates the value of mentioned topic in the chat,
“0”indicates the value of topic which is not mentioned in the
table.
We can use the formula of mutual information M(X,Y) = I(X) +
I(Y) - I(X,Y)to get the value of the mutual information, defined
as M(X,Y), where I(X) represents the entropy from A, I(Y)
represents the entropy from B.
0 0 1 1( ) log logI X P P P P (4.1-1)
0 0 1 1( ) log logI Y P P P P (4.1-2)
10 10 11 11 01 01( , ) log log logI X Y P P P P P P (4.1-3)
P0 and P1 denoting the probability of 0 and 1 respectively; P10,
P11 and P01 denoting the probability of (1, 0), (1, 1) and (0, 1)
respectively.
4.2 Markov Chain Model
In order to get the Markov state transition matrix, we first need
to get the 83*83 Markov correlation matrix(rows represent the
person’s words, columns represent the words that others said to
him /her). According to each topic, sum up the weight of the
words involved in the conversation between one person and the
others so as to get the total weight of one person, in the same
way, we can come to the other people’s weight, thus getting an
83*83 Markov correlation matrix.
Although we use Google’s Page Rank model,some people are
not the information receivers in the case, so it is unavoidable
that the sum of a few columns in the association matrix is 0.
Taking this into account, when computing the Markov Transfer
Matrix, we do some special handling on the correlation matrix
to make the sum of all the columns of the correlation matrix
non-zero.
Define the Markov Transfer Matrix as ( )ijA a , then
'
j ij
i
c g (4.2-1)
'
i ij
j
r g (4.2-2)
Cj represents the sum of columns in matrix G(gij), ri represents
the sum of rows in matrix G(gij), gij represents the elements in
the correlation matrix G(gij).
The formula used to solve the elements in transfer matrix A is
'
(1 )
*
ij
ij
j
gd
a d
n c
(4.2-3)
d is a model parameter, we choose to define d = 0.85 based on
experience. aij represents the summed scores of the message
topics involved in the conversation from person j to person i.
From the basic nature of the Markov chain, there is a stationary
distribution in the regular Markov chain 1 2( , , , )T
Nx x x x ,then
we can get the sorting in accordance with the conspiracy
possibility.
4.3 Improved Page Rank Model
As we have known,there are 7 conspirators among the 83
people in the second case and each person’s scores can be
defined as ( ), 1,2, ,7P i i . And the scores of each
innocent person can be defined as P(j), j = 1,2,...,8. The model is
as follows:
7 8
1 1
max ( , ) ( ) ( )
i j
M s P i P j
(4.3-1)
3. International Journal of Science and Engineering Applications
Volume 4 Issue 3, 2015, ISSN-2319-7560 (Online)
www.ijsea.com 102
. .st
Ax x (4.3-2)
1
1
n
i
i
x
(4.3-3)
The objective function indicates the maximum probability gap
between the known conspirators and the innocent (All the
conspirators are put in the upper part of the list, and all
innocence in the lower part of the list, leaving the undecided
people located in the middle of the list). We use k to express the
maximum of mutual information in the 12 topics.
represents the steps from the upper bound to 20. Through the
debugging and analysis, we find that when is given a value
of 0.5, the results will be much more in line with the subject
requirements. s is the initial vector distribution, the
percentage of each conspirator is 0.1, and the percentage of each
innocent person equals to 0, so the undetermined equally
allocate the remaining percentage. We define that s ranges from
0.5 to 0.9 is distributed in steps of 0.1, and we can get an initial
vector distribution on the subject on the whole.
4.4 Solution of The 10-people Case
In practice, we conducted some exploratory analysis on the
previous case, and get 4 ranking list of the probability of all the
people when defining k =1,10,12,15,17,20,respectively.
Through comparative analysis, we find that when k=17, the
ranking we get will be more in line with the true outcome of the
case than other values, that is, people like Inez don't get off,
people like Carol are not falsely accused, and people like Bob
do not have the opportunity to get reduced sentences. When
k=17, we can obtain a ranking list of suspicious degree, listed as
follows:
Table 2:The Sorting Table of the degree of Suspicious (k=17)
Ranking Number Name
1 7 George
2 2 Bob
3 9 Inez
4 4 Dave
5 5 Ellen
6 1 Anne
7 8 Harry
8 10 Jaye
9 3 Carol
10 6 Fred
Conclusion Analysis:
(1)George and Dave are the given co-conspirators, ranking
fourth in the list. Meanwhile, Bob ranks second, illustrating he
is one of the co-conspirators, so he cannot get off. Inez ranks
third, meaning that we can also catch him.
(2)Jaye and Anne are both ranked relatively rearward, which is
in accordance with the given information. And based on Carol
ranking No.9, we can basically determine her innocence.
Overall, the solution derived from our models can meet the
actual requirements and have some promotional value.
From the previous analysis, we know that the 10-people case is
a microcosm of the current case, and the results of small case
subject has been given to us, so understanding and solving the
smaller case can be seen as a test of our models, which may help
us to modify them.
4.5 The Solution of the Current case for Problem 1
When conducting an exploratory analysis on the current case,
we used the same way as the first one to analyze. We found that
when k=17 or k=20, we can get two basically the same
probability rankings of all the people, much more in line with
the requirements listed in the subject.
First, we can obtain the ranking list of the 15 topics using the
Mutual Information Method.
Table 3:Score Ranking List of 15 Topics
Number Score
1 0.883016
2 0.857865
3 0.898072
4 0.87294
5 0.872976
6 0.857882
1 1
8 0.888021
9 0.883014
10 0.893037
11 1
12 0.888021
13 1
14 0.888034
15 0.883008
Notes: The numbers of suspicious topics are 7, 11, 13.
The suspicious degree ranking of the suspected, innocent and
the senior managers based on k=17.
Table4: The Sorting Table Based on the Suspicious Degree
of the Conspirators, Non-conspirators and Senior Managers (K=
17)
Ranking Name Number
1 Yao 68
2 Alex 22
4 Paul 44
6 Harvey 50
7 Dolores 11
8 Ulf 55
17 Jerome 17
19 Jean 19
27 Paige 3
28 Elsie 38
31 Darlene 49
46 Chris 1
57 Tran 65
58 Ellin 69
60 Gretchen 5
4. International Journal of Science and Engineering Applications
Volume 4 Issue 3, 2015, ISSN-2319-7560 (Online)
www.ijsea.com 103
Notes: The senior managers are Dolores, Jerome and Gretchen.
Conclusion Analysis:
(1) By analyzing the result, we can find that two senior
managers, namely Dolores and Jerome are both conspirators.
(2) The already known conspirators are shown in the forefront
of the ranking list, we can basically draw that they are the
conspirators.
(3)The already known non-conspirators are all at the rear of the
ranking list, except for Paige who ranks 27, the others can be
determined to be innocent. Although there are a few minor
differences, this result has been very much in line with the
actual situation, which means the feasibility and stability of the
model is very well.
4.6 The Solution of the Current Case for Problem 2
As to the second requirement, we can also use the Improved
Page Rank Model to analyze data. First obtain a modified
ranking list of the 15 topics, listed as follows:
Table 5:Score Ranking List of the 15 Modified Topics
number score
1 1
2 1.177737
3 1.20412
4 1.193429
5 1.172717
6 1.167063
7 1
8 1.199102
9 1.193429
10 1.20412
11 1
12 1.20412
13 1
14 1.173339
15 1.193429
Then we can get the ranking list about the suspicion degree of
the suspected, the innocent and the senior managers.
Table 6:The Sorting Table Based on the Suspicious Degree
of the Conspirators, Non-conspirators and Senior Managers
Number Name
1 Yao
2 Alex
4 Paul
6 Harvey
7 Dolores
8 Ulf
17 Jerome
19 Jean
27 Paige
28 Elsie
30 Darlene
45 Chris
57 Ellin
58 Tran
60 Gretchen
Notes: The senior managers are Dolores, Jerome and Gretchen,
the 7 suspected people are Yao, Alex, Paul, Harvey, Ulf, Elsie.
Conclusion Analysis
The newly added conspirator Chris does not make much
difference in results compared with the ones in the first
requirement. According to analysis, this may be due to Chris's
inactive involvement in the conversation, so even if he is one of
the conspirators, his influence on others is very weak, resulting
in the basically similar results.
4.7 The Solution of the Current Case for Problem 3
In this section, we introduce the conception of Semantic
network and apply the theory of Similarity to the calculation of
topic scores.
As we have obtained the original messages, which is a much
bigger corpus than that we used in the first two questions, so we
can apply the similarity analyze into our model. To describe it in
detail:
First: Calculate the similarities of among the 3 suspected
messages(A) and the 12 unsuspected messages (B),
A includes A1,A2,A3,B includes,
( , )i jSim A B
d
(4.7-1)
d : the distance between the two corpus.
: an changeable parameter.(usually between 0 and 1)
Second: give the index rank of B1,B2,……,B12
12
1
( , ),( 1,2,3)j i j
j
B Sim A B i
(4.7-2)
Third: set the value of A1,A2,A3.
( ),( 1,2,3 1....12)i jA Max B i j (4.7-3)
So far, we successfully obtain the rank of the 15 messages,
comparing with the rank based on the index of mutual
information; this model theoretically provides a more credible
result.
5. ADVANTAGES AND
DISADVANTAGES
5.1 Advantages:
We introduced the concept of mutual information when
computing topics, and build the relationship between the
suspicious topics and the undetermined ones.
5.2 Disadvantages:
●As it is mentioned in requirement 3 that due to the
limitation of the message traffic, we can just roughly determine
the doubt degree of the topics by semantic network analysis.
● Though the computing of the mutual information, we
can just determine the values of the 12 undetermined topics,
however, the values of the 3 higher suspected ones are
artificially assigned.
●Without much experience for reference in the model, the
accuracy of the results may be undermined.
5. International Journal of Science and Engineering Applications
Volume 4 Issue 3, 2015, ISSN-2319-7560 (Online)
www.ijsea.com 104
6. PROMOTION OF THE MODEL
Through the analysis, we found that our model is considerable
promotional, such as being used as a method to find the infected
or diseased cells in a biological network. Briefly speaking, if we
can get the meaningful characteristics from all the cells can
calculate the mutual information between these characteristics.
Later, we can try to obtain the mutual information between the
undetermined cells and the infected ones so as to get the
probability ranking list of the undetermined. Though the
detecting process, we may us the Improved Search-based Page
Rank Model and the Markov Transfer Matrix to simplify our
operation as well as increase the validity of our research.
We have introduced the mutual information theory in analyzing
this problem in the first two parts, with a relatively satisfactory
result. As far as we know, semantic network analysis which is a
widely used in artificial intelligence and computational
linguistics. It provides a structure and process for reasoning
about knowledge or language. And another useful computational
linguistics capability in natural language processing is text
analysis.
7. CONCLUSION
In this paper, we introduce the model of improved Page Rank
algorithm based on mutual information and successfully solved
the case problems. The consequences are listed as follows: In
the previous case, George and Dave are the given
co-conspirators. Bob and Inez are proved other co-conspirators.
Carol is innocent. Overall, the solution derived from our models
can meet the actual requirements and have some promotional
value. Then in the next case, by analyzing the result, we can find
that two senior managers, namely Dolores and Jerome are both
conspirators. This will help the investigation of the cases. The
other version of the second case, the newly added suspect Chris
does not make much difference in results compared with the
ones in the first requirement. According to analysis, this may be
due to Chris's inactive involvement in the conversation, so even
if he is one of the conspirators, his influence on others is very
weak, resulting in the basically similar results.
All in all, as we only use the topics which are derived from
original messages in mutual information analysis, so our result
will be more valid if we have a larger corpus. That is to say, we
can introduce semantic network and use similarity-computing to
acquire a more precise result, in that way, our judgment will be
more reasonable.
8. ACKNOWLEDGMENTS
This work was supported in part by the Graduate Innovation
Foundation Project of Shan-dong University of Science and
Technology (YC140106). Also, we would like to thank those
anonymous reviewers for their constructive comments to help us
improve the paper.
9. REFERENCES
[1]Haveliwala, T. (1999) Efficient Computation of PageRank.
Technical Report. Stanford.
[2]Battiti, R.(1994)Using mutual information for selecting
features in supervised neural net learning.
[3]Terry. (1999) The Page Rank Citation Ranking: Bringing
Order to the Web. Technical Report. Stanford InfoLab.
[4]Jackendoff, Ray S.(1972) Semantic Interpretation in
Generative Grammar.
[5]Xing, W., Ghorbani A. (2004) Weighted Page Rank algorithm,
Communication Networks and Services Research. Proceedings.
[6]Catherine Benincasa (2006) Page Rank Algorithm.
[7]John F. Sowa (1991) Principles of Semantic Networks.
Alexander Kraskov, Harald Stogbauer, and Peter Grassberger
(2004) Estimating mutual information.