Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making.Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for
the binary classification and not applicable to multiclass classification directly. SAMME boosting
technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binaryclassification.In this paper, students’ performance prediction system usingMulti Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide helpto the low students by optimization rules.The proposed system has been implemented and evaluated by investigate the prediction accuracy ofAdaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed
C4.5 single classifier and LogitBoost.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Multi Label Spatial Semi Supervised Classification using Spatial Associative ...cscpconf
Multi-label spatial classification based on association rules with multi objective genetic
algorithms (MOGA) enriched by semi supervised learning is proposed in this paper. It is to deal
with multiple class labels problem. In this paper we adapt problem transformation for the multi
label classification. We use hybrid evolutionary algorithm for the optimization in the generation
of spatial association rules, which addresses single label. MOGA is used to combine the single
labels into multi labels with the conflicting objectives predictive accuracy and
comprehensibility. Semi supervised learning is done through the process of rule cover
clustering. Finally associative classifier is built with a sorting mechanism. The algorithm is
simulated and the results are compared with MOGA based associative classifier, which out
performs the existing
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
University servers and databases store a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. main problem that faces any system administration or any users is data increasing per-second, which is stored in different type and format in the servers, learning about students from a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. Graduation and academic information in the future and maintaining structure and content of the courses according to their previous results become importance. The paper objectives are extract knowledge from incomplete data structure and what the suitable method or technique of data mining to extract knowledge from a huge amount of data about students to help the administration using technology to make a quick decision. Data mining aims to discover useful information or knowledge by using one of data mining techniques, this paper used classification technique to discover knowledge from student’s server database, where all students’ information were registered and stored. The classification task is used, the classifier tree C4.5, to predict the final academic results, grades, of students. We use classifier tree C4.5 as the method to classify the grades for the students .The data include four years period [2006-2009]. Experiment results show that classification process succeeded in training set. Thus, the predicted instances is similar to the training set, this proves the suggested classification model. Also the efficiency and effectiveness of C4.5 algorithm in predicting the academic results, grades, classification is very good. The model also can improve the efficiency of the academic results retrieving and evidently promote retrieval precision.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
While designing a new type of engineering material one has to search for some existing
materials which suits design requirement and then he can try to produce new kind of
engineering material. This selection process itself is tedious as he has to select few numbers of
materials out of a set of lakhs of materials. Therefore in this paper a model is proposed to select
a particular material which suits the user requirement, by using some similarity/distance
measuring functionalities. Here thirteen different types of similarity/distance measuring
functionalities are examined. Performance Index Measure(PIM) is calculated to verify the
relative performance of the selected material with the target material. Then all the results are
normalised for the purpose of analysing the results. Hence the proposed model reduces the
wastage of time in selection and also avoids the haphazardly selection of the materials in materials design and manufacturing industries.
03 fauzi indonesian 9456 11nov17 edit septianIAESIJEECS
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Quantitative and Qualitative Analysis of Time-Series Classification using Dee...Nader Ale Ebrahim
Time-series classification is utilized in a variety of applications leading to the development of many data mining techniques for time-series analysis. Among the broad range of time-series classification algorithms, recent studies are considering the impact of deep learning methods on time-series classification tasks. The quantity of related publications requires a bibliometric study to explore most prominent keywords, countries, sources and research clusters. The paper conducts a bibliometric analysis on related publications in time-series classification, adopted from Scopus database between 2010 and 2019. Through keywords co-occurrence analysis, a visual network structure of top keywords in time-series classification research has been produced and deep learning has been introduced as the most common topic by additional inquiry of the bibliography. The paper continues by exploring the publication trends of recent deep learning approaches for time-series classification. The annual number of publications, the productive and collaborative countries, the growth rate of sources, the most occurred keywords and the research collaborations are revealed from the bibliometric analysis within the study period. The research field has been broken down into three main categories as different frameworks of deep neural networks, different applications in remote sensing and also in signal processing for time-series classification tasks. The qualitative analysis highlights the categories of top citation rate papers by describing them in details.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
The case-based reasoning (CBR) approach is a modern approach. It is adopted for designing knowledgebased expertise systems. The aforementioned approach depends much on the stored experiences. These experiences serve as cases that can be employed for solving new problems. That is done through retrieving similar cases from the system and utilizing their solutions. The latter approach aims at solving problems through reviewing, processing and applying their experiences. The present study aimed to shed a light on a CBR application. That is done to develop a new system for assisting Internet users and solving the problems faced by those users. In addition, the present study focuses on the cases retrieval stage. It aimed at designing and building an experienced inquiry system for solving any problem that internet users might face when using a case-based reasoning (CBR) dialogue. That shall enable internet users to solve the problems faced when using the Internet. The system that was developed by the researcher operates through displaying the similar cases through a dialogue. That is done through using a well-developed algorithm and reviewing the relevant previous studies. It was found that the success rate of the proposed system is high.
Data Mining Application in Advertisement Management of Higher Educational Ins...ijcax
In recent years, Indian higher educational institute’s competition grows rapidly for attracting students to get enrollment in their institutes. To attract students educational institutes select a best advertisement method. There are different advertisements available in the market but a selection of them is very difficult
for institutes. This paper is helpful for institutes to select a best advertisement medium using some data mining methods.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
University servers and databases store a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. main problem that faces any system administration or any users is data increasing per-second, which is stored in different type and format in the servers, learning about students from a huge amount of data including personal details, registration details, evaluation assessment, performance profiles, and many more for students and lecturers alike. Graduation and academic information in the future and maintaining structure and content of the courses according to their previous results become importance. The paper objectives are extract knowledge from incomplete data structure and what the suitable method or technique of data mining to extract knowledge from a huge amount of data about students to help the administration using technology to make a quick decision. Data mining aims to discover useful information or knowledge by using one of data mining techniques, this paper used classification technique to discover knowledge from student’s server database, where all students’ information were registered and stored. The classification task is used, the classifier tree C4.5, to predict the final academic results, grades, of students. We use classifier tree C4.5 as the method to classify the grades for the students .The data include four years period [2006-2009]. Experiment results show that classification process succeeded in training set. Thus, the predicted instances is similar to the training set, this proves the suggested classification model. Also the efficiency and effectiveness of C4.5 algorithm in predicting the academic results, grades, classification is very good. The model also can improve the efficiency of the academic results retrieving and evidently promote retrieval precision.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
While designing a new type of engineering material one has to search for some existing
materials which suits design requirement and then he can try to produce new kind of
engineering material. This selection process itself is tedious as he has to select few numbers of
materials out of a set of lakhs of materials. Therefore in this paper a model is proposed to select
a particular material which suits the user requirement, by using some similarity/distance
measuring functionalities. Here thirteen different types of similarity/distance measuring
functionalities are examined. Performance Index Measure(PIM) is calculated to verify the
relative performance of the selected material with the target material. Then all the results are
normalised for the purpose of analysing the results. Hence the proposed model reduces the
wastage of time in selection and also avoids the haphazardly selection of the materials in materials design and manufacturing industries.
03 fauzi indonesian 9456 11nov17 edit septianIAESIJEECS
Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
Data mining or Knowledge discovery (KDD) is
extracting unknown (hidden) and useful knowledge from data.
Data mining is widely used in many areas like retail, sales, ecommerce,
remote sensing, bioinformatics etc. Student’s
performance has become one of the most complex puzzle for
universities and colleges in recent past with the tremendous
growth. In this paper, authors deployed data mining techniques
like classification, association rule, chi-square etc. for knowledge
discovery. For this study, authors have used data set containing
Approx. 180 MCA (post graduate) students results data of 3
colleges. Study found that one can apply data mining
functionalities like Chi-square, Association rule and Lift in
Education and discover areas of improvement.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
PREDICTING ACADEMIC MAJOR OF STUDENTS USING BAYESIAN NETWORKS TO THE CASE OF ...ijcax
In this study, which took place current year in the city of Maragheh in IRAN. Number of high school students in the fields of study: mathematics, Experimental Sciences, humanities, vocational, business and science were studied and compared. The purpose of this research is to predict the academic major of high school students using Bayesian networks. The effective factors have been used in academic major selection for the first time as an effective indicator of Bayesian networks. Evaluation of Impacts of indicators on each other, discretization data and processing them was performed by GeNIe. The proper course would be advised for students to continue their education.
Quantitative and Qualitative Analysis of Time-Series Classification using Dee...Nader Ale Ebrahim
Time-series classification is utilized in a variety of applications leading to the development of many data mining techniques for time-series analysis. Among the broad range of time-series classification algorithms, recent studies are considering the impact of deep learning methods on time-series classification tasks. The quantity of related publications requires a bibliometric study to explore most prominent keywords, countries, sources and research clusters. The paper conducts a bibliometric analysis on related publications in time-series classification, adopted from Scopus database between 2010 and 2019. Through keywords co-occurrence analysis, a visual network structure of top keywords in time-series classification research has been produced and deep learning has been introduced as the most common topic by additional inquiry of the bibliography. The paper continues by exploring the publication trends of recent deep learning approaches for time-series classification. The annual number of publications, the productive and collaborative countries, the growth rate of sources, the most occurred keywords and the research collaborations are revealed from the bibliometric analysis within the study period. The research field has been broken down into three main categories as different frameworks of deep neural networks, different applications in remote sensing and also in signal processing for time-series classification tasks. The qualitative analysis highlights the categories of top citation rate papers by describing them in details.
DEVELOPING A CASE-BASED RETRIEVAL SYSTEM FOR SUPPORTING IT CUSTOMERSIJCSEA Journal
The case-based reasoning (CBR) approach is a modern approach. It is adopted for designing knowledgebased expertise systems. The aforementioned approach depends much on the stored experiences. These experiences serve as cases that can be employed for solving new problems. That is done through retrieving similar cases from the system and utilizing their solutions. The latter approach aims at solving problems through reviewing, processing and applying their experiences. The present study aimed to shed a light on a CBR application. That is done to develop a new system for assisting Internet users and solving the problems faced by those users. In addition, the present study focuses on the cases retrieval stage. It aimed at designing and building an experienced inquiry system for solving any problem that internet users might face when using a case-based reasoning (CBR) dialogue. That shall enable internet users to solve the problems faced when using the Internet. The system that was developed by the researcher operates through displaying the similar cases through a dialogue. That is done through using a well-developed algorithm and reviewing the relevant previous studies. It was found that the success rate of the proposed system is high.
Data Mining Application in Advertisement Management of Higher Educational Ins...ijcax
In recent years, Indian higher educational institute’s competition grows rapidly for attracting students to get enrollment in their institutes. To attract students educational institutes select a best advertisement method. There are different advertisements available in the market but a selection of them is very difficult
for institutes. This paper is helpful for institutes to select a best advertisement medium using some data mining methods.
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
In health research, one of the major tasks is to retrieve, and analyze heterogeneous databases containing
one single patient’s information gathered from a large volume of data over a long period of time. The
main objective of this paper is to represent our ontology-based information retrieval approach for
clinical Information System. We have performed a Case Study in the real life hospital settings. The results
obtained illustrate the feasibility of the proposed approach which significantly improved the information
retrieval process on a large volume of data over a long period of time from August 2011 until January
2012
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
Similar to Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search
An Extensible Web Mining Framework for Real KnowledgeIJEACS
With the emergence of Web 2.0 applications that bestow rich user experience and convenience without time and geographical restrictions, web usage logs became a goldmine to researchers across the globe. User behavior analysis in different domains based on web logs has its utility for enterprises to have strategic decision making. Business growth of enterprises depends on customer-centric approaches that need to know the knowledge of customer behavior to succeed. The rationale behind this is that customers have alternatives and there is intense competition. Therefore business community needs business intelligence to have expert decisions besides focusing customer relationship management. Many researchers contributed towards this end. However, the need for a comprehensive framework that caters to the needs of businesses to ascertain real needs of web users. This paper presents a framework named eXtensible Web Usage Mining Framework (XWUMF) for discovering actionable knowledge from web log data. The framework employs a hybrid approach that exploits fuzzy clustering methods and methods for user behavior analysis. Moreover the framework is extensible as it can accommodate new algorithms for fuzzy clustering and user behavior analysis. We proposed an algorithm known as Sequential Web Usage Miner (SWUM) for efficient mining of web usage patterns from different data sets. We built a prototype application to validate our framework. Our empirical results revealed that the framework helps in discovering actionable knowledge.
Machine learning for text document classification-efficient classification ap...IAESIJAI
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text document categorization method that is both efficient and effective. In addition, methods for determining the proper relationship between a set of words in a document and its document categorization is also obtained.
A systematic mapping study of performance analysis and modelling of cloud sys...IJECEIAES
Cloud computing is a paradigm that uses utility-driven models in providing dynamic services to clients at all levels. Performance analysis and modelling is essential because of service level agreement guarantees. Studies on performance analysis and modelling are increasing in a productive manner on the cloud landscape on issues like virtual machines and data storage. The objective of this study is to conduct a systematic mapping study of performance analysis and modelling of cloud systems and applications. A systematic mapping study is useful in visualization and summarizing the research carried in an area of interest. The systematic study provided an overview of studies on this subject by using a structure, based on categorization. The results are presented in terms of research such as evaluation and solution, and contribution such as tools and method utilized. The results showed that there were more discussions on optimization in relation to tool, method and process with 6.42%, 14.29% and 7.62% respectively. In addition, analysis based on designs in terms of model had 14.29% and publication relating to optimization in terms of evaluation research had 9.77%, validation 7.52%, experience 3.01%, and solution 10.51%. Research gaps were identified and should motivate researchers in pursuing further research directions.
Multi Similarity Measure based Result Merging Strategies in Meta Search EngineIDES Editor
In Meta Search Engine result merging is the key
component. Meta Search Engines provide a uniform query
interface for Internet users to search for information.
Depending on users’ needs, they select relevant sources and
map user queries into the target search engines, subsequently
merging the results. The effectiveness of a Meta Search
Engine is closely related to the result merging algorithm it
employs. In this paper, we have proposed a Meta Search
Engine, which has two distinct steps (1) searching through
surface and deep search engine, and (2) Ranking the results
through the designed ranking algorithm. Initially, the query
given by the user is inputted to the deep and surface search
engine. The proposed method used two distinct algorithms
for ranking the search results, concept similarity based
method and cosine similarity based method. Once the results
from various search engines are ranked, the proposed Meta
Search Engine merges them into a single ranked list. Finally,
the experimentation will be done to prove the efficiency of
the proposed visible and invisible web-based Meta Search
Engine in merging the relevant pages. TSAP is used as the
evaluation criteria and the algorithms are evaluated based on
these criteria.
Building a recommendation system based on the job offers extracted from the w...IJECEIAES
Recruitment, or job search, is increasingly used throughout the world by a large population of users through various channels, such as websites, platforms, and professional networks. Given the large volume of information related to job descriptions and user profiles, it is complicated to appropriately match a user's profile with a job description, and vice versa. The job search approach has drawbacks since the job seeker needs to search a job offers in each recruitment platform, manage their accounts, and apply for the relevant job vacancies, which wastes considerable time and effort. The contribution of this research work is the construction of a recommendation system based on the job offers extracted from the web and on the e-portfolios of job seekers. After the extraction of the data, natural language processing is applied to structured data and is ready for filtering and analysis. The proposed system is a content-based system, it measures the degree of correspondence between the attributes of the e-portfolio with those of each job offer of the same list of competence specialties using the Euclidean distance, the result is classified with a decreasing way to display the most relevant to the least relevant job offers
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
Any university management system accumulates a cartload of data and analytics can be applied on it to gather useful
information to aid the academic decision making process. This paper is a novel attempt to demonstrate the significance of a data
analytic web service in the education domain. This can be integrated with the University Management System or any other application
of the university easily. Analytics as a web service offers much benefits over the traditional analysis methods. The web service can be
hosted on a web server and accessed over the internet or on to the private cloud of the campus. The data from various courses from
different departments can be uploaded and analyzed easily. In this paper we design a web service framework to be used in educational
data mining that provide analysis as a service.
An effective search on web log from most popular downloaded contentijdpsjournal
A Web page recommender system effectively predicts the best related web page to search. While search
ing
a word from search engine it may display some unnecessary links and unrelated data’s to user so to a
void
this problem, the con
ceptual prediction model combines both the web usage and domain knowledge. The
proposed conceptual prediction model automatically generates a semantic network of the semantic Web
usage knowledge, which is the integration of domain knowledge and web usage i
nformation. Web usage
mining aims to discover interesting and frequent user access patterns from web browsing data. The
discovered knowledge can then be used for many practical web applications such as web
recommendations, adaptive web sites, and personali
zed web search and surfing
A Framework for Automated Association Mining Over Multiple DatabasesGurdal Ertek
Literature on association mining, the data mining methodology that investigates associations between items, has primarily focused on efficiently mining larger databases. The motivation for association mining is to use the rules obtained from historical data to influence future transactions. However, associations in transactional processes change significantly over time, implying that rules extracted for a given time interval may not be applicable for a later time interval. Hence, an
analysis framework is necessary to identify how associations change over time. This paper presents such a framework, reports the implementation of the framework as a tool, and demonstrates the applicability of and the necessity for the framework through a case study in the domain of finance.
http://research.sabanciuniv.edu.
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
The huge amount of library data stored in our modern research and statistic centers of organizations is springing up on daily bases. These databases grow exponentially in size with respect to time, it becomes exceptionally difficult to easily understand the behavior and interpret data with the relationships that exist between attributes. This exponential growth of data poses new organizational challenges like the conventional record management system infrastructure could no longer cope to give precise and detailed information about the behavior data over time. There is confusion and novel concern in selecting tools that can support and handle big data visualization that deals with multi-dimension. Viewing all related data at once in a database is a problem that has attracted the interest of data professionals with machine learning skills. This is a lingering issue in the data industry because the existing techniques cannot be used to remove or filter noise from relevant data and pad up missing values in order to get the required information. The aim is to develop a stacked generalization model that combines the functionality of random forest and decision tree to visualization library database visualization. In this paper, the random forest and decision tree techniques were employed to effectively visualize large amounts of school library data. The proposed system was implemented with a few lines of Python code to create visualizations that can help users at a glance understand and interpret the behavior of data and its relationships. The model was trained and tested to learn and extract hidden patterns of data with a cross-validation test. It combined the functionalities of both models to form a stacked generalization model that performed better than the individual techniques. The stacked model produced 95% followed by the RF which produced a 95% accuracy rate and 0.223600 RMSE error value in comparison with the DT which recorded an 80.00% success rate and 0.15990 RMSE value.
Novel holistic architecture for analytical operation on sensory data relayed...IJECEIAES
With increasing adoption of the sensor-based application, there is an exponential rise of the sensory data that eventually take the shape of the big data. However, the practicality of executing high end analytical operation over the resource-constrained big data has never being studied closely. After reviewing existing approaches, it is explored that there is no cost-effective schemes of big data analytics over large scale sensory data processiing that can be directly used as a service. Therefore, the propsoed system introduces a holistic architecture where streamed data after performing extraction of knowedge can be offered in the form of services. Implemented in MATLAB, the proposed study uses a very simplistic approach considering energy constrained of the sensor nodes to find that proposed system offers better accuracy, reduced mining duration (i.e. faster response time), and reduced memory dependencies to prove that it offers cost effective analytical solution in contrast to existing system.
Similar to Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search (20)
Bibliometric analysis highlighting the role of women in addressing climate ch...IJECEIAES
Fossil fuel consumption increased quickly, contributing to climate change
that is evident in unusual flooding and draughts, and global warming. Over
the past ten years, women's involvement in society has grown dramatically,
and they succeeded in playing a noticeable role in reducing climate change.
A bibliometric analysis of data from the last ten years has been carried out to
examine the role of women in addressing the climate change. The analysis's
findings discussed the relevant to the sustainable development goals (SDGs),
particularly SDG 7 and SDG 13. The results considered contributions made
by women in the various sectors while taking geographic dispersion into
account. The bibliometric analysis delves into topics including women's
leadership in environmental groups, their involvement in policymaking, their
contributions to sustainable development projects, and the influence of
gender diversity on attempts to mitigate climate change. This study's results
highlight how women have influenced policies and actions related to climate
change, point out areas of research deficiency and recommendations on how
to increase role of the women in addressing the climate change and
achieving sustainability. To achieve more successful results, this initiative
aims to highlight the significance of gender equality and encourage
inclusivity in climate change decision-making processes.
Voltage and frequency control of microgrid in presence of micro-turbine inter...IJECEIAES
The active and reactive load changes have a significant impact on voltage
and frequency. In this paper, in order to stabilize the microgrid (MG) against
load variations in islanding mode, the active and reactive power of all
distributed generators (DGs), including energy storage (battery), diesel
generator, and micro-turbine, are controlled. The micro-turbine generator is
connected to MG through a three-phase to three-phase matrix converter, and
the droop control method is applied for controlling the voltage and
frequency of MG. In addition, a method is introduced for voltage and
frequency control of micro-turbines in the transition state from gridconnected mode to islanding mode. A novel switching strategy of the matrix
converter is used for converting the high-frequency output voltage of the
micro-turbine to the grid-side frequency of the utility system. Moreover,
using the switching strategy, the low-order harmonics in the output current
and voltage are not produced, and consequently, the size of the output filter
would be reduced. In fact, the suggested control strategy is load-independent
and has no frequency conversion restrictions. The proposed approach for
voltage and frequency regulation demonstrates exceptional performance and
favorable response across various load alteration scenarios. The suggested
strategy is examined in several scenarios in the MG test systems, and the
simulation results are addressed.
Enhancing battery system identification: nonlinear autoregressive modeling fo...IJECEIAES
Precisely characterizing Li-ion batteries is essential for optimizing their
performance, enhancing safety, and prolonging their lifespan across various
applications, such as electric vehicles and renewable energy systems. This
article introduces an innovative nonlinear methodology for system
identification of a Li-ion battery, employing a nonlinear autoregressive with
exogenous inputs (NARX) model. The proposed approach integrates the
benefits of nonlinear modeling with the adaptability of the NARX structure,
facilitating a more comprehensive representation of the intricate
electrochemical processes within the battery. Experimental data collected
from a Li-ion battery operating under diverse scenarios are employed to
validate the effectiveness of the proposed methodology. The identified
NARX model exhibits superior accuracy in predicting the battery's behavior
compared to traditional linear models. This study underscores the
importance of accounting for nonlinearities in battery modeling, providing
insights into the intricate relationships between state-of-charge, voltage, and
current under dynamic conditions.
Smart grid deployment: from a bibliometric analysis to a surveyIJECEIAES
Smart grids are one of the last decades' innovations in electrical energy.
They bring relevant advantages compared to the traditional grid and
significant interest from the research community. Assessing the field's
evolution is essential to propose guidelines for facing new and future smart
grid challenges. In addition, knowing the main technologies involved in the
deployment of smart grids (SGs) is important to highlight possible
shortcomings that can be mitigated by developing new tools. This paper
contributes to the research trends mentioned above by focusing on two
objectives. First, a bibliometric analysis is presented to give an overview of
the current research level about smart grid deployment. Second, a survey of
the main technological approaches used for smart grid implementation and
their contributions are highlighted. To that effect, we searched the Web of
Science (WoS), and the Scopus databases. We obtained 5,663 documents
from WoS and 7,215 from Scopus on smart grid implementation or
deployment. With the extraction limitation in the Scopus database, 5,872 of
the 7,215 documents were extracted using a multi-step process. These two
datasets have been analyzed using a bibliometric tool called bibliometrix.
The main outputs are presented with some recommendations for future
research.
Use of analytical hierarchy process for selecting and prioritizing islanding ...IJECEIAES
One of the problems that are associated to power systems is islanding
condition, which must be rapidly and properly detected to prevent any
negative consequences on the system's protection, stability, and security.
This paper offers a thorough overview of several islanding detection
strategies, which are divided into two categories: classic approaches,
including local and remote approaches, and modern techniques, including
techniques based on signal processing and computational intelligence.
Additionally, each approach is compared and assessed based on several
factors, including implementation costs, non-detected zones, declining
power quality, and response times using the analytical hierarchy process
(AHP). The multi-criteria decision-making analysis shows that the overall
weight of passive methods (24.7%), active methods (7.8%), hybrid methods
(5.6%), remote methods (14.5%), signal processing-based methods (26.6%),
and computational intelligent-based methods (20.8%) based on the
comparison of all criteria together. Thus, it can be seen from the total weight
that hybrid approaches are the least suitable to be chosen, while signal
processing-based methods are the most appropriate islanding detection
method to be selected and implemented in power system with respect to the
aforementioned factors. Using Expert Choice software, the proposed
hierarchy model is studied and examined.
Enhancing of single-stage grid-connected photovoltaic system using fuzzy logi...IJECEIAES
The power generated by photovoltaic (PV) systems is influenced by
environmental factors. This variability hampers the control and utilization of
solar cells' peak output. In this study, a single-stage grid-connected PV
system is designed to enhance power quality. Our approach employs fuzzy
logic in the direct power control (DPC) of a three-phase voltage source
inverter (VSI), enabling seamless integration of the PV connected to the
grid. Additionally, a fuzzy logic-based maximum power point tracking
(MPPT) controller is adopted, which outperforms traditional methods like
incremental conductance (INC) in enhancing solar cell efficiency and
minimizing the response time. Moreover, the inverter's real-time active and
reactive power is directly managed to achieve a unity power factor (UPF).
The system's performance is assessed through MATLAB/Simulink
implementation, showing marked improvement over conventional methods,
particularly in steady-state and varying weather conditions. For solar
irradiances of 500 and 1,000 W/m2
, the results show that the proposed
method reduces the total harmonic distortion (THD) of the injected current
to the grid by approximately 46% and 38% compared to conventional
methods, respectively. Furthermore, we compare the simulation results with
IEEE standards to evaluate the system's grid compatibility.
Enhancing photovoltaic system maximum power point tracking with fuzzy logic-b...IJECEIAES
Photovoltaic systems have emerged as a promising energy resource that
caters to the future needs of society, owing to their renewable, inexhaustible,
and cost-free nature. The power output of these systems relies on solar cell
radiation and temperature. In order to mitigate the dependence on
atmospheric conditions and enhance power tracking, a conventional
approach has been improved by integrating various methods. To optimize
the generation of electricity from solar systems, the maximum power point
tracking (MPPT) technique is employed. To overcome limitations such as
steady-state voltage oscillations and improve transient response, two
traditional MPPT methods, namely fuzzy logic controller (FLC) and perturb
and observe (P&O), have been modified. This research paper aims to
simulate and validate the step size of the proposed modified P&O and FLC
techniques within the MPPT algorithm using MATLAB/Simulink for
efficient power tracking in photovoltaic systems.
Adaptive synchronous sliding control for a robot manipulator based on neural ...IJECEIAES
Robot manipulators have become important equipment in production lines, medical fields, and transportation. Improving the quality of trajectory tracking for
robot hands is always an attractive topic in the research community. This is a
challenging problem because robot manipulators are complex nonlinear systems
and are often subject to fluctuations in loads and external disturbances. This
article proposes an adaptive synchronous sliding control scheme to improve trajectory tracking performance for a robot manipulator. The proposed controller
ensures that the positions of the joints track the desired trajectory, synchronize
the errors, and significantly reduces chattering. First, the synchronous tracking
errors and synchronous sliding surfaces are presented. Second, the synchronous
tracking error dynamics are determined. Third, a robust adaptive control law is
designed,the unknown components of the model are estimated online by the neural network, and the parameters of the switching elements are selected by fuzzy
logic. The built algorithm ensures that the tracking and approximation errors
are ultimately uniformly bounded (UUB). Finally, the effectiveness of the constructed algorithm is demonstrated through simulation and experimental results.
Simulation and experimental results show that the proposed controller is effective with small synchronous tracking errors, and the chattering phenomenon is
significantly reduced.
Remote field-programmable gate array laboratory for signal acquisition and de...IJECEIAES
A remote laboratory utilizing field-programmable gate array (FPGA) technologies enhances students’ learning experience anywhere and anytime in embedded system design. Existing remote laboratories prioritize hardware access and visual feedback for observing board behavior after programming, neglecting comprehensive debugging tools to resolve errors that require internal signal acquisition. This paper proposes a novel remote embeddedsystem design approach targeting FPGA technologies that are fully interactive via a web-based platform. Our solution provides FPGA board access and debugging capabilities beyond the visual feedback provided by existing remote laboratories. We implemented a lab module that allows users to seamlessly incorporate into their FPGA design. The module minimizes hardware resource utilization while enabling the acquisition of a large number of data samples from the signal during the experiments by adaptively compressing the signal prior to data transmission. The results demonstrate an average compression ratio of 2.90 across three benchmark signals, indicating efficient signal acquisition and effective debugging and analysis. This method allows users to acquire more data samples than conventional methods. The proposed lab allows students to remotely test and debug their designs, bridging the gap between theory and practice in embedded system design.
Detecting and resolving feature envy through automated machine learning and m...IJECEIAES
Efficiently identifying and resolving code smells enhances software project quality. This paper presents a novel solution, utilizing automated machine learning (AutoML) techniques, to detect code smells and apply move method refactoring. By evaluating code metrics before and after refactoring, we assessed its impact on coupling, complexity, and cohesion. Key contributions of this research include a unique dataset for code smell classification and the development of models using AutoGluon for optimal performance. Furthermore, the study identifies the top 20 influential features in classifying feature envy, a well-known code smell, stemming from excessive reliance on external classes. We also explored how move method refactoring addresses feature envy, revealing reduced coupling and complexity, and improved cohesion, ultimately enhancing code quality. In summary, this research offers an empirical, data-driven approach, integrating AutoML and move method refactoring to optimize software project quality. Insights gained shed light on the benefits of refactoring on code quality and the significance of specific features in detecting feature envy. Future research can expand to explore additional refactoring techniques and a broader range of code metrics, advancing software engineering practices and standards.
Smart monitoring technique for solar cell systems using internet of things ba...IJECEIAES
Rapidly and remotely monitoring and receiving the solar cell systems status parameters, solar irradiance, temperature, and humidity, are critical issues in enhancement their efficiency. Hence, in the present article an improved smart prototype of internet of things (IoT) technique based on embedded system through NodeMCU ESP8266 (ESP-12E) was carried out experimentally. Three different regions at Egypt; Luxor, Cairo, and El-Beheira cities were chosen to study their solar irradiance profile, temperature, and humidity by the proposed IoT system. The monitoring data of solar irradiance, temperature, and humidity were live visualized directly by Ubidots through hypertext transfer protocol (HTTP) protocol. The measured solar power radiation in Luxor, Cairo, and El-Beheira ranged between 216-1000, 245-958, and 187-692 W/m 2 respectively during the solar day. The accuracy and rapidity of obtaining monitoring results using the proposed IoT system made it a strong candidate for application in monitoring solar cell systems. On the other hand, the obtained solar power radiation results of the three considered regions strongly candidate Luxor and Cairo as suitable places to build up a solar cells system station rather than El-Beheira.
An efficient security framework for intrusion detection and prevention in int...IJECEIAES
Over the past few years, the internet of things (IoT) has advanced to connect billions of smart devices to improve quality of life. However, anomalies or malicious intrusions pose several security loopholes, leading to performance degradation and threat to data security in IoT operations. Thereby, IoT security systems must keep an eye on and restrict unwanted events from occurring in the IoT network. Recently, various technical solutions based on machine learning (ML) models have been derived towards identifying and restricting unwanted events in IoT. However, most ML-based approaches are prone to miss-classification due to inappropriate feature selection. Additionally, most ML approaches applied to intrusion detection and prevention consider supervised learning, which requires a large amount of labeled data to be trained. Consequently, such complex datasets are impossible to source in a large network like IoT. To address this problem, this proposed study introduces an efficient learning mechanism to strengthen the IoT security aspects. The proposed algorithm incorporates supervised and unsupervised approaches to improve the learning models for intrusion detection and mitigation. Compared with the related works, the experimental outcome shows that the model performs well in a benchmark dataset. It accomplishes an improved detection accuracy of approximately 99.21%.
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
This research is developing an incubator system that integrates the internet of things and artificial intelligence to improve care for premature babies. The system workflow starts with sensors that collect data from the incubator. Then, the data is sent in real-time to the internet of things (IoT) broker eclipse mosquito using the message queue telemetry transport (MQTT) protocol version 5.0. After that, the data is stored in a database for analysis using the long short-term memory network (LSTM) method and displayed in a web application using an application programming interface (API) service. Furthermore, the experimental results produce as many as 2,880 rows of data stored in the database. The correlation coefficient between the target attribute and other attributes ranges from 0.23 to 0.48. Next, several experiments were conducted to evaluate the model-predicted value on the test data. The best results are obtained using a two-layer LSTM configuration model, each with 60 neurons and a lookback setting 6. This model produces an R 2 value of 0.934, with a root mean square error (RMSE) value of 0.015 and a mean absolute error (MAE) of 0.008. In addition, the R 2 value was also evaluated for each attribute used as input, with a result of values between 0.590 and 0.845.
A review on internet of things-based stingless bee's honey production with im...IJECEIAES
Honey is produced exclusively by honeybees and stingless bees which both are well adapted to tropical and subtropical regions such as Malaysia. Stingless bees are known for producing small amounts of honey and are known for having a unique flavor profile. Problem identified that many stingless bees collapsed due to weather, temperature and environment. It is critical to understand the relationship between the production of stingless bee honey and environmental conditions to improve honey production. Thus, this paper presents a review on stingless bee's honey production and prediction modeling. About 54 previous research has been analyzed and compared in identifying the research gaps. A framework on modeling the prediction of stingless bee honey is derived. The result presents the comparison and analysis on the internet of things (IoT) monitoring systems, honey production estimation, convolution neural networks (CNNs), and automatic identification methods on bee species. It is identified based on image detection method the top best three efficiency presents CNN is at 98.67%, densely connected convolutional networks with YOLO v3 is 97.7%, and DenseNet201 convolutional networks 99.81%. This study is significant to assist the researcher in developing a model for predicting stingless honey produced by bee's output, which is important for a stable economy and food security.
A trust based secure access control using authentication mechanism for intero...IJECEIAES
The internet of things (IoT) is a revolutionary innovation in many aspects of our society including interactions, financial activity, and global security such as the military and battlefield internet. Due to the limited energy and processing capacity of network devices, security, energy consumption, compatibility, and device heterogeneity are the long-term IoT problems. As a result, energy and security are critical for data transmission across edge and IoT networks. Existing IoT interoperability techniques need more computation time, have unreliable authentication mechanisms that break easily, lose data easily, and have low confidentiality. In this paper, a key agreement protocol-based authentication mechanism for IoT devices is offered as a solution to this issue. This system makes use of information exchange, which must be secured to prevent access by unauthorized users. Using a compact contiki/cooja simulator, the performance and design of the suggested framework are validated. The simulation findings are evaluated based on detection of malicious nodes after 60 minutes of simulation. The suggested trust method, which is based on privacy access control, reduced packet loss ratio to 0.32%, consumed 0.39% power, and had the greatest average residual energy of 0.99 mJoules at 10 nodes.
Fuzzy linear programming with the intuitionistic polygonal fuzzy numbersIJECEIAES
In real world applications, data are subject to ambiguity due to several factors; fuzzy sets and fuzzy numbers propose a great tool to model such ambiguity. In case of hesitation, the complement of a membership value in fuzzy numbers can be different from the non-membership value, in which case we can model using intuitionistic fuzzy numbers as they provide flexibility by defining both a membership and a non-membership functions. In this article, we consider the intuitionistic fuzzy linear programming problem with intuitionistic polygonal fuzzy numbers, which is a generalization of the previous polygonal fuzzy numbers found in the literature. We present a modification of the simplex method that can be used to solve any general intuitionistic fuzzy linear programming problem after approximating the problem by an intuitionistic polygonal fuzzy number with n edges. This method is given in a simple tableau formulation, and then applied on numerical examples for clarity.
The performance of artificial intelligence in prostate magnetic resonance im...IJECEIAES
Prostate cancer is the predominant form of cancer observed in men worldwide. The application of magnetic resonance imaging (MRI) as a guidance tool for conducting biopsies has been established as a reliable and well-established approach in the diagnosis of prostate cancer. The diagnostic performance of MRI-guided prostate cancer diagnosis exhibits significant heterogeneity due to the intricate and multi-step nature of the diagnostic pathway. The development of artificial intelligence (AI) models, specifically through the utilization of machine learning techniques such as deep learning, is assuming an increasingly significant role in the field of radiology. In the realm of prostate MRI, a considerable body of literature has been dedicated to the development of various AI algorithms. These algorithms have been specifically designed for tasks such as prostate segmentation, lesion identification, and classification. The overarching objective of these endeavors is to enhance diagnostic performance and foster greater agreement among different observers within MRI scans for the prostate. This review article aims to provide a concise overview of the application of AI in the field of radiology, with a specific focus on its utilization in prostate MRI.
Seizure stage detection of epileptic seizure using convolutional neural networksIJECEIAES
According to the World Health Organization (WHO), seventy million individuals worldwide suffer from epilepsy, a neurological disorder. While electroencephalography (EEG) is crucial for diagnosing epilepsy and monitoring the brain activity of epilepsy patients, it requires a specialist to examine all EEG recordings to find epileptic behavior. This procedure needs an experienced doctor, and a precise epilepsy diagnosis is crucial for appropriate treatment. To identify epileptic seizures, this study employed a convolutional neural network (CNN) based on raw scalp EEG signals to discriminate between preictal, ictal, postictal, and interictal segments. The possibility of these characteristics is explored by examining how well timedomain signals work in the detection of epileptic signals using intracranial Freiburg Hospital (FH), scalp Children's Hospital Boston-Massachusetts Institute of Technology (CHB-MIT) databases, and Temple University Hospital (TUH) EEG. To test the viability of this approach, two types of experiments were carried out. Firstly, binary class classification (preictal, ictal, postictal each versus interictal) and four-class classification (interictal versus preictal versus ictal versus postictal). The average accuracy for stage detection using CHB-MIT database was 84.4%, while the Freiburg database's time-domain signals had an accuracy of 79.7% and the highest accuracy of 94.02% for classification in the TUH EEG database when comparing interictal stage to preictal stage.
Analysis of driving style using self-organizing maps to analyze driver behaviorIJECEIAES
Modern life is strongly associated with the use of cars, but the increase in acceleration speeds and their maneuverability leads to a dangerous driving style for some drivers. In these conditions, the development of a method that allows you to track the behavior of the driver is relevant. The article provides an overview of existing methods and models for assessing the functioning of motor vehicles and driver behavior. Based on this, a combined algorithm for recognizing driving style is proposed. To do this, a set of input data was formed, including 20 descriptive features: About the environment, the driver's behavior and the characteristics of the functioning of the car, collected using OBD II. The generated data set is sent to the Kohonen network, where clustering is performed according to driving style and degree of danger. Getting the driving characteristics into a particular cluster allows you to switch to the private indicators of an individual driver and considering individual driving characteristics. The application of the method allows you to identify potentially dangerous driving styles that can prevent accidents.
Hyperspectral object classification using hybrid spectral-spatial fusion and ...IJECEIAES
Because of its spectral-spatial and temporal resolution of greater areas, hyperspectral imaging (HSI) has found widespread application in the field of object classification. The HSI is typically used to accurately determine an object's physical characteristics as well as to locate related objects with appropriate spectral fingerprints. As a result, the HSI has been extensively applied to object identification in several fields, including surveillance, agricultural monitoring, environmental research, and precision agriculture. However, because of their enormous size, objects require a lot of time to classify; for this reason, both spectral and spatial feature fusion have been completed. The existing classification strategy leads to increased misclassification, and the feature fusion method is unable to preserve semantic object inherent features; This study addresses the research difficulties by introducing a hybrid spectral-spatial fusion (HSSF) technique to minimize feature size while maintaining object intrinsic qualities; Lastly, a soft-margins kernel is proposed for multi-layer deep support vector machine (MLDSVM) to reduce misclassification. The standard Indian pines dataset is used for the experiment, and the outcome demonstrates that the HSSF-MLDSVM model performs substantially better in terms of accuracy and Kappa coefficient.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
2. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3870
Identifying the intent behind the user query is a crucial step toward reaching this goal. In a broad sense,
the automatic prediction of user intentions helps in enhancing the user experience by returning more relevant
results to users and adapting these results to their specific needs. Thus, vertical selection is associated with
two main challenges that are: the diversity of the verticals and the understanding of the user intent.
Figure 1. Aggregated search process [1]
Regarding the first challenge, a variety of heterogeneous vertical search engines exist in the web,
which means that each vertical has its own features, for example, some verticals are not directly searchable
by users like the weather vertical, thus features generated from the vertical query-log will not be available for
this kind of verticals. Also, features generated from vertical corpus will not be available for verticals such as
the calculator and language translation [2]. Therefore, researchers must deal with the fact that different
verticals may require different feature representations when creating new vertical selection approaches.
The second challenge is related to the user intent; we know that a vertical selection system focuses
on retrieving relevant verticals for showing its results to the user. For example, if the user searches images
and news verticals, he specifically needs the results of these verticals, he isn't interested in the content of
the other verticals such as shopping or weather, even if their content is relevant because the goal is to not
only have relevant results but also satisfy the user intent.
Existing research papers to date have studied the problem of vertical selection from different
ways [3], some prior work focused on constructing models that aim at detecting queries with content-specific
vertical such as shopping [4], news [5], jobs [4] or Question Answering [6]. Other works for vertical
selection [7-10] focused on using traditional machine learning techniques to combine multiple types of
features for selecting several relevant verticals. Although these techniques are very efficient, handling
vertical selection with high accuracy is still a challenging research task.
In this paper, we are interested in textual queries; image queries tend to have image results.
Our proposed approach for predicting vertical intent consists of generating query embeddings vectors using
doc2vec algorithm, that can accurately preserve syntactic and semantic information within each query,
therefore we propose to use it as a primary query representation in our vertical selection model pipeline;
then it will be used as input to a convolutional neural network (CNN) model that can increase this
representation with multiple levels of abstraction comprising rich semantic information and creating a global
summarization of the query features. To the best of our knowledge, this is the first time when the benefits of
deep learning and paragraph vectors are exploited in the context of vertical selection, which can achieve
an amazing progression and development in this area.
The remainder of this paper is structured as follows. In the next section, we review the related work
concerning vertical selection. In Section 3, we provide a description of our proposed method. Section 4 is
devoted to the experimental settings. We present and discuss the experimental results in Section 5. Finally,
we conclude the study and discuss the future issue.
3. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3871
2. RELATED WORK
Aggregated search can be compared to federated search, which aims to provide an integrated search
across multiple text collections, referred to as resources, into one single ranking list [11]. Similar to
aggregated search, federated search is typically decomposed into two sub-tasks: resource selection and results
merging. The main difference between federated search and aggregated search is the heterogeneity of
the data and the presentation of the results.
Regarding resource selection, existing approaches can be categorized into two classes of algorithms:
term-based and sample-based. For the first class, Term-based algorithms represent shards by collection
statistics about the terms in the search engine’s vocabulary. Callan et al. [12] propose the CORI algorithm,
in which a shard is represented by the number of its documents that contain the terms of a vocabulary.
The shards are ranked using the INDRI version of the 𝑡𝑓. 𝑖𝑑𝑓 score function using the mentioned number as
the frequency of each query term. CORI selects a fixed number of shards from the top of this ranking.
Another work of [13] proposes Taily, a novel shard selection algorithm that models a query’s score
distribution in each shard as a Gamma distribution and selects shards with highly scored documents in the tail
of the distribution. Taily estimates the parameters of score distributions based on the mean and variance of
the score function’ s features in the collections and shards. Concerning the algorithms of the second category,
they use a central sample index (CSI) of documents from each shard for shard selection. For example, in [14]
authors proposed REDDE algorithm, where they rank shards according to the number of the highest ranked
documents that belong to this shard, weighted by the ratio between the shard’s size and the size of the shard’s
documents in the CSI. The SUSHI algorithm [15] choose the best fitting function from a list of possible
functions between the estimated ranks of a shard’s documents in the CSI and their observed scores from
the initial search. Using this function, the algorithm estimates the scores of the top-ranked documents of each
shard. SUSHI selects shards based on their estimated number of documents among a number of top-ranked
documents in the global ranking.
In the other hand, identifying the query intent of a user is a well-known problem in Information
retrieval and have been studied by a considerable number of researchers in this field, where its goal is to
select relevant documents or web search results that can satisfy the user information need. For this reason,
most of the approaches cited in the literature employ query logs, supervised, semi-supervised classifiers,
different language models and word embeddings. We have for example some popular works that examined
query intent detection, like the work of [16] where the authors have developed a new dataset with almost
2,000 queries tagged in informational, navigational and transactional categories. They calculated features for
each of these queries using a real-world query log. Jiang et Yang in [17], tackle the problem of query
intent inference by integrating multiple information sources in a seamless manner. They first propose
a comprehensive data model called Search Query Log Structure (SQLS) that represents the relationship
between search queries via the User dimension, the URL dimension, the Session dimension, and the Term
dimension. Then they propose three new frameworks that are effective to infer query intents by mining
the multidimensional structure constructed from the search query log. Another work in [18] explores
the competence of lexical statistics and embedding method. First, a novel term expansion algorithm is
designed to sketch all possible intent candidates. Moreover, an efficient query intent generation model is
proposed, which learns latent representations for intent candidates via embedding-based methods, and then
vectorized intent candidates are clustered and detected as query intents. Kim et al. [19] used enriched word
embeddings to force semantically similar or dissimilar words to be closer or farther away in the embedding
space to improve the performance of intent detection task for spoken language understanding, and thus by
exploiting several semantic lexicons, such as WordNet, PPDB (Paraphrase Database), and Macmillan
Dictionary, and using them later as initial representation of words for intent detection. Thus, they train
an end-to-end model that jointly learns the slot and intent classes from the training data by building
a bidirectional LSTM (Long Short-Term Memory). In [20], the proposed approach aims at identifying query
intent as a multi-class classification task which extracted query vector representations using CNN instead of
engineering query features. This method uses deep learning to find query vector representations, then use
them as features to classify queries by intent.
As we can see in the reviewed approaches above, the problem of query intent detection is studied
without considering the vertical intent issue. knowing that with the appearance of aggregated search, various
web search engines do not return uniform lists of Web pages, but they also include results of a different type
from different verticals such as images, news, videos, and so on, which means that the key component of this
domain is the notion of verticals. Therefore, an aggregated search system must make predictions about which
vertical(s) are more expected to answer the issued query, which allows reducing the load of querying a large
set of multiple verticals, and then improves the search effectiveness.
The relevance of a vertical depends basically on two factors, the relevance of the documents within
the vertical collection and on the user’s intent to the vertical (Vertical Intent). For the first one, to make
4. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3872
vertical selection decision, various approaches use traditional machine learning techniques to combine
different sources of evidence that can be found in Query features (features depend only on the query) [6-9],
in Vertical features (features depend only on the vertical) [2, 5, 21, 22] and in Vertical-Query features
(features aim to measure relationships between the vertical and the query, and are therefore unique to
the vertical-query pair) [2, 7, 22-24]. Indeed, among all these works, there are those that integrate the content
from a single vertical [3]. In this respect, Li et al. [4] address the general problem of vertical selection using
an approach that focuses on shopping and job verticals to extract implicit feedback using semi-supervised
learning based on clickthrough data. Diaz [5] investigated also the vertical selection problem with respect to
the news vertical, where he derived features from news collection, web and vertical query-logs and
incorporated click-feedback into the model. More recent work in [6] has also targeted a variant of the vertical
selection problem, where the author use Community Question Answering (CQA) Verticals for detecting
queries with CQA intent.
Other approaches have been developed where several verticals are considered simultaneously [3].
Arguello et al. [7] propose a classification-based approach for vertical selection in which they exploit features
from the vertical content, the query string, and the vertical’s query log. The click-through data is used to
construct a descriptive language model for each vertical’s related queries. Diaz and Arguello [9] also present
several algorithms for combining user feedback with offline classifier information, the focus of their work
was to maximize user satisfaction by presenting the appropriate vertical display. Another work from Arguello
et al [8] have been proposed where the goal was to use training data associated with a set of existing verticals
in order to learn a model that can make vertical selection predictions for a target vertical. Recent work in
the same context was proposed from [10], in which the desired vertical of the user is placed on the top of
the web result page. This is achieved by predicting verticals based on the user’ s past behavior.
Regarding the second factor, there are a few works addressing the vertical intent issue when
studying query intent detection. From the user intent perspective, user vertical intent also plays an important
role in the improvement of the aggregated search process. For example, Zhou et al. [25] propose
a methodology to predict the vertical intent of a query using a search engine log by exploiting click-through
data. Recent work of Tsur et al. [6] present a supervised classification scheme where they aim at detecting
queries with question intent as a variant of the vertical selection problem. They introduced two classification
schemes that consider query structure. In the first approach, they induce features from the query structure as
an input to supervised linear classification. In the second approach, word clusters and their positions in
the query are used as input to a random forest classifier to identify discriminative structural elements
in the query.
Despite its interesting role, the research in this direction is still limited, and there is no huge
literature regarding vertical selection based on user vertical intent, especially with the evolution shown in IR
(Information Retrieval) during the last years. Therefore, we will focus in this work on the problem of vertical
intent prediction, where we propose a new approach that combines the doc2vec algorithms and convolutional
neural networks and exploits for the first time the benefits of both techniques in order to improve vertical
selection task.
3. PROPOSED APPROACH
Through the vertical selection process, the query is processed and sent to multiple verticals as well
as the Web search engine, in order to decide which of those should be selected for a given query,
this depends on what are the verticals intended to be retrieved by the user, we refer to this as the user vertical
intent (VI) and it can be defined as follows:
Given a user’s query (𝑄) and a set of candidate verticals 𝑉 = {𝑣1, 𝑣2, … , 𝑣 𝑛}, the vertical intent 𝐼 𝑄
is represented by the vector 𝐼 𝑄 = {𝑖1, 𝑖2, … , 𝑖 𝑛}, where each value 𝑖 𝑘 indicates the importance of the given
vertical 𝑣 𝑘 to the query 𝑄, and for each vertical 𝑣 𝑘, given a threshold 𝛿, above which the vertical is assumed
to have a high intent for the query: if 𝑖 𝑘 > 𝛿 then we can say that the vertical 𝑣 𝑘 is intended by the query 𝑄.
To address this problem, we propose an approach based on Doc2vec and Convolutional Neural
Networks. First, it generates a query embeddings vector using the doc2vec algorithm that preserves syntactic
and semantic information within each query. Secondly, this vector will be used as input to a Convolutional
Neural Network model for increasing the representation of the query with multiple levels of abstraction
including rich semantic information and then creating a global summarization of the query features. Figure 2
shows the overall architecture of our proposed vertical selection system. It contains two main parts,
the semantic representation of the query and the query level feature extraction. Subsections bellow describe
these two parts and all performed steps in depth.
5. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3873
Figure 2. Architecture of the proposed vertical selection system
3.1. Semantic representation of the query
The core algorithm of this step is doc2vec which is an unsupervised model that is used most to
construct distributed representations of arbitrarily long sentences. It is an extension of word2vec that learns
fixed-length feature representations for variable-length pieces of texts such as sentences, paragraphs,
and documents [26]. A Doc2vec or paragraph vectors has two different architectures: The Distributed
Bag-of-Words model and the Distributed Memory model. The Distributed Bag-of-Words (DBOW) model
trains faster and does not consider word order; it predicts a random group of words in a paragraph based on
the provided paragraph vector. In the Distributed Memory (DM) model, the paragraph is treated as an extra
word, which is then averaged with the local relevant word vectors for making predictions. This method,
however, acquires additional calculation but can achieve better results than DBOW.
We chose Doc2Vec model because it overcomes the disadvantages of the other bag-of-words
models by learning semantic relationships between words, this is why it has been widely used recently in
various NLP tasks [27-29] and Information Retrieval works [30-35] where it has proven that it is able to
capture the semantics of paragraphs which leads to excellent results. In this step, the goal is to learn a good
semantic representation of the input query, following that idea; we tried to represent each query as vector and
used this vector as features for our classification model as presented in Figure 2. Therefore, Doc2vec
algorithm contributes effectively for improving the performance of our system.
3.2. Query level feature extraction
Unlike traditional approaches that require handcrafted features to predict relevant verticals, our
approach consists of using a Convolutional Neural Network to extract the most important semantic features
that represent each query and delete those that are unnecessary. Recently, CNNs have achieved promising
performances in various NLP tasks, such as Information Extraction [36], Summarization [37], Machine
Translation [38], Classification [39], Question Answering [40] and other traditional NLP tasks [41-43]. CNN
architecture used in this paper is shown in Figure 3.
- Input Layer: First of all, once the query embeddings vectors are generated from the previous step, we use
them to construct the input matrix needed in the embedding layer where each query is being represented
as a 2-dimensional matrix.
- Convolutional layer: The primary purpose of this layer is to capture the syntactic and semantic features of
the entire query and compress these valuable semantics into feature maps. Thus, we perform several
convolutions over the embedded query vectors using multiple filters with different window size.
As the filter moves on, various features are produced and combined into a feature map. Activation
functions are added to incorporate element-wise non-linearity.
6. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3874
- Pooling layer: In this layer, the goal is to extract the most relevant features within each feature map,
therefore, we use the max-pooling strategy for the pooling operation. Since there are multiple feature
maps, we have a vector after each pooling operation. All vectors that are obtained from the max-pooling
layer are concatenated into a fixed-length feature vector.
- Fully connected layers: these layers constitute the classification part of our model; their main purpose is
to use high-level features obtained from the previous layers and passed them to the final softmax layer for
classifying the input query into various classes based on its vertical intents scores.
In addition to these different layers, there are some optimizations that we have performed in order to
reduce the overfitting and obtain better test accuracy. These optimizations include applying an l2 norm
constraint of the weight vectors in the convolutions and fully connected layers as well as adding several
Batch Normalization layers, that normalize the activations of the previous layer for each batch during
training, which helps to train our model faster and consequently improve our model’s performance.
Figure 3. Illustration of the CNN model architecture
4. EXPERIMENTS
This section describes the datasets used and the various hyperparameters chosen for evaluating
the performance of our system, it also gives details about how the doc2vec and CNN were trained.
4.1. Datasets
In these experiments, we employ three public datasets for the training, validation and testing our
proposed system respectively, a summary statistic of these datasets is listed in Table 1. We describe each
dataset in detail below:
7. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3875
In the first dataset, we use the official NTCIR-12 IMine-2 vertical intent collection for English
Subtopics [44] which was designed to explore and evaluate the technologies of understanding user intents
behind the query. IMINE-2 includes a set of 100 topics (i.e. queries). Each topic is labeled by a set of intents
with probabilities, and there is a set of subtopics as relevance judgment for each vertical intent. A subtopic of
a given query is viewed as a search intent that specializes and/or disambiguates the original query,
the number of these subtopics is 533. The general idea is to first generate a more complete representation of
the different possible intents associated with the input query, and then to perform vertical selection for each
intent separately.
In addition, this dataset includes five types of queries, namely “ambiguous”, “faceted”, “very clear”,
“task-oriented”, and “vertical-oriented”, this allows us to investigate the performances of our system with
diverse queries and varied topics. The details of the five query types are as follows [44]:
a. Ambiguous: The concepts/objects behind the query are ambiguous (e.g., "Jaguar" -> car, animal, etc.).
b. Faceted: The information needs behind the query include many facets or aspects (e.g., “harry potter” ->
movie, book, Wikipedia, etc.).
c. Very clear: The information need behind the query is very clear so that usually a single relevant
document can satisfy his information needs. (e.g., “apple.com homepage”)
d. Task-oriented: The search intent behind the query relates the searcher’s goal (e.g., “lose weight” ->
exercise, healthy food, medicine, etc.).
e. Vertical-oriented: The search intent behind the query strongly indicates a specific vertical (e.g., “iPhone
photo” -> Image vertical).
We use this dataset as training data from which we construct a validation set by selecting
10% randomly. The second dataset used in this paper is FedWeb’14 [45] that is used in the TREC FedWeb
track 2014, the collection contains search result pages from 108 web search engines (e.g. Google, Yahoo!,
YouTube and Wikipedia). For each engine, 75 test topics were provided, from which 50 will be used for
vertical selection evaluations. We conduct a prediction task for these 50 test queries and compare the result
obtained with the real values in the dataset.
The last dataset is TREC 2009 Million Query Track [46], this collection contains 40000 queries that
were sampled from two large query logs. To help anonymize and select queries with relatively high volume,
they were processed by a filter that converted them into queries with roughly equal frequency in a third query
log. This dataset is used as training data for doc2vec algorithm to improve its performance with large data.
Moreover, the different queries used in these experiments have different lengths, from short to long queries
as shown in Figures 4, 5 and 6. The structure of our model allows us to learn both kinds.
Table 1. Statistical information of various datasets
Dataset Description Size (number
of queries)
Max Query Length
(number of words)
Task
NTCIR-12 IMINE-2
English Subtopic
Mining [44]
It contains the Query understanding subtask
and the Vertical Incorporating subtask.
533 12 90% Training
10%
Validation
FedWeb’14 [45] It is designed to resource selection, results
merging and vertical selection tasks.
50 9 Testing
TREC Million Query
Track 2009 [46]
It is an exploration of ad hoc retrieval over a
large set of queries and a large collection of
documents, and it investigates questions of
system evaluation.
40000 16 doc2vec'
training data
Figure 4. Query length for NTCIR-12 IMine-2 Figure 5. Query length for FedWeb’14
8. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3876
Figure 6. Query length for TREC million query track 2009
4.2. Hyperparameters and training
Firstly, in order to train and evaluate our proposed system, we need to adjust several
hyperparameters. Indeed, regarding the first part of our architecture, to use doc2vec for our datasets, we first
trained doc2vec model (Python gensim library implementation) on TREC 2009 Million Query Track datasets
using Distributed Memory model. Then we transformed all the queries on both training and testing sets to
Doc2Vec vectors. The various parameters used for training doc2vec model are shown in Table 2.
In the second part, the implementation of our network is made using Keras framework with
TensorFlow backend. For the input layer, we used Gensim's doc2vec embeddings and created input data from
it, instead of using keras embedding layer. To build our CNN architecture there are many hyperparameters to
choose from. Therefore, we first consider the performance of a baseline CNN configuration using
the parameters described in Table 3.
Table 2. Different parameters used to train Doc2Vec model
Parameters alpha min_alpha min_count Vector size Number of epochs Library used
Value 0.025 0.0025 1 300 100 Gensim
Table 3. Parameters of the baseline CNN configuration
filter region
sizes
Feature
maps
l2
regularization
Dropout
rate
Batch
size
optimizer
Activation
function
pooling Loss function
Number
of epochs
[3-5] 100 0.01 0.5 64 adam ReLU
max
pooling
Mean squared
error
100
Then, we evaluated the effect of each of the other parameters by holding all other settings constant
and vary only the factor of interest. During these experiments, we chose to use ReLU activation function and
max-pooling strategy for our CNN model with mean squared error loss function, these parameters are mostly
used in CNN architectures and gives good performances. Finally, we combined all the good variation results
obtained from these experiments, and used them for our suggested CNN model.
5. RESULTS AND DISCUSSION
In this section, we present and discuss our experimental results obtained by each part of our vertical
selection system.
5.1. Semantic representation of the query using Doc2vec
Starting with the first part of our proposed system, once doc2vec model is trained, we use it to
generate an embedding vector for each query in our collections (train set and validation set + test set).
These embedding vectors must capture semantic meanings of each of these queries. In this respect, to make
sure that this model achieved this goal, we used our doc2vec model to find the most similar queries for two
sample queries in our dataset, the resulting queries, and corresponding scores are presented in Table 4. As we
can see in this table, the similarity scores between the first queries and the second ones are significant.
We figure out that this doc2vec model is a meaningful model and can successfully recognize semantic
information among queries.
9. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3877
Table 4. Similarity score for two sample queries from our collections
First query Second query Their similarity score
Mother’s Day gifts shopping site Wallpaper computer 0.24370566066263763
Make resume online free Make resume free templates 0.9092499447309569
Apple latest news products home cleaning products 0.8546902948569026
bananas seeds plant bananas seed inside 0.9979968070983887
Virginia tourism Iraq war pictures 0.1463257649642163
wallpaper magazine Asian culture 0.29107128845148067
5.2. Query level feature extraction using CNN
As we have evaluated the quality of the query vectors obtained in the previous part, we proceed now
to the evaluation of our model architecture. First of all, we start by assessing the impact of various parameters
used in order to configure our CNN model and obtain the best possible results.
5.2.1. Impact of filter sizes
We explored the effect of various filter sizes, while keeping the number of filters for each region
size fixed at 100. Figure 7 shows that the plot for the filter size of [2-4] was at the top of all the other plots
throughout the run, and it yielded better accuracy (69.94%) compared to filter sizes [3-5] and [4-6] (66.39%
and 67.22% respectively) as reported in Table 5.
Figure 7. Accuracy comparison of filter size variations
Table 5. Accuracy comparison of filter size variations
Filter sizes [2-4] [3-5] [4-6]
Accuracy 69.94% 66.39% 67.22%
5.2.2. Impact of feature maps
The variations of the number of filters per filter sizes don’t help much as shown in Figure 8, but still
there are a few noticeable accuracy results when the number of filters is 200 (68.48%) as shown in Table 6.
5.2.3. Impact of regularization
We have used two common regularization strategies for our CNN, that are dropout and l2 norm
constraints. We explore the effect of these two strategies here. we presented the effect of the l2 norm imposed
on the weight vectors in Figure 9 and Table 7. We then experimented with varying the dropout rate from 0.1
to 0.5 as shown in in Figure 10 and Table 8, fixing the l2 norm constraint to 0.01. The variations in
regularization show that for the l2 norm constraint, the classification performance is higher with value=0
which produce best results compared to higher values that often hurts performance as shown in Figure 9 and
Table 7. Separately, we considered the effect of dropout rate. Figure 10 indicate that when setting the dropout
rate to 0.2 it gives the best accuracy results (69.73%) and this value decrease when we increase the dropout
rate as reported in Table 8.
5.2.4. Impact of batch size
We next investigated the effect of the Batch size. Figure 11 and Table 9 show that varying the batch
size also helps little, and the best accuracy result obtained was 67.01% with a batch size of value 60.
10. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3878
Figure 8. Impact of the number of filters per filter
size
Figure 9. Accuracy comparison with various
variations of l2 norm constraint
Table 6. Impact of the number of
filters per filter size
Number of filters Accuracy (%)
N=50 66.39%
N=100 66.81%
N=130 65.97%
N=150 66.60%
N=200 68.48%
Table 7. Accuracy comparison with various
variations of l2 norm constraint
l2 norm constraint value Accuracy
0 75.37%
0.01 65.97%
0.1 65.14%
0.25 65.34%
0.4 64.93%
Figure 10. Accuracy comparison with various
variations of dropout rate
Figure 11. Accuracy comparison with various batch
sizes
Table 8. Accuracy comparison with various
variations of dropout rate
Dropout rate Accuracy
0.1 67.64%
0.2 69.73%
0.3 67.43%
0.4 65.97%
0.5 65.34%
Table 9. Accuracy comparison
with various batch sizes
Batch sizes Accuracy
30 65.34%
60 67.01%
120 64.72%
240 66.64%
5.2.5. Impact of optimizers
Regarding the optimizers, we can see clearly from Figure 12 that only the curves of “adam” and
“adadelta” optimizers were at the top. Thus, they produce best accuracy results (66.81% and 65.97%)
compared to “sgd” (56.16%) as shown in Table 10. Next, we exploited the observations given above to
configure our CNN model with the best variations of parameters. We summarize the suggested parameters in
Table 11.
11. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3879
Figure 12. Accuracy comparison with various optimizers
Table 10. Accuracy comparison with various optimizers
Optimizers adam sgd adadelta
Accuracy 66.81% 56.16% 65.97%
Table 11. Various parameters used for the suggested CNN model
Filter sizes Number of filters per filter size l2 regularization Dropout rate Batch size optimizer
[2-4] 200 0 0.2 60 adam
Now, we can evaluate our model with the suggested values. In this respect, we choose to assess
the CNN architecture using the accuracy and the Mean Square Error (MSE) loss, that represent the most
important and intuitive metrics for classification performance. They are defined as follows:
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
# 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
# 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
(1)
𝑀𝑆𝐸 =
1
𝑛
∑(𝑦 − 𝑦̂)2
(2)
where (𝑦 − 𝑦̂)2
present the square of the difference between the actual and the predicted result. Figures 13
and 14 show respectively the values of the loss and the accuracy obtained on both training and validation
sets. After training for 100 epochs we have obtained the results shown in Table 12. These results show that
an accuracy up to 80% seems to be the best value that can be obtained in this experiment.
Figure 13. Loss of the proposed model for
the training and the validation sets
Figure 14. Accuracy of the proposed model for
the training and the validation sets
12. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3880
Table 12. Classification accuracy and loss results of the proposed approach
Training accuracy Validation accuracy Training loss Validation loss
Baseline model 65.34% 70.37% 3.87% 4.48%
Suggested model 79.75% 72.22% 1.56% 3.73%
The difference between the results obtained from the baseline model and the suggested model show
clearly the impact of the suggested values chosen. Since there is a noticeable increase in the accuracy value
between the two models. Moreover, Figure 14 shows that the classification accuracy is varied, however,
after 20 epochs, we obtain the higher value. There is also a difference in accuracy between training and
validation sets.
In the other hand, by using mean square error (MSE) as a classification loss, we optimize by
reducing the average squared distance between our predictions and the true values. In Figure 13, we can see
that the validation loss is synchronized with the training loss, and the validation loss is decreasing. All these
observations prove that our system achieves significant results (accuracy and loss) which demonstrate its
performance even if our data isn't that big. In most cases, the size of the data is not very important, what is
important is the variations in the samples we have, which is the case of our training dataset that includes
a variety of samples.
5.3. The effectiveness of our vertical intent prediction system
5.3.1. Predictions on unseen data
To complete our experiments, we evaluate the performance of our model on the prediction task of
vertical intents on new unseen data. Now that we have trained our model and evaluated its accuracy on
the validation set, we can use it to make predictions, this step is done using the test dataset (FedWeb 14) that
has not been used for any training or validation operations. This is an estimate of the efficiency of
the algorithm trained on the problem when making predictions on unseen data.
As mentioned previously, we have already generated query vectors for the test dataset using
doc2vec in the first part of this experimentation. Thus, we have fed these vectors to our CNN model
for predictions, which allows us to obtain a vertical intent score for each query. Evaluating the quality
of these predictions using mean squared error gives us 1.31% which prove that our model achieves
accurate predictions.
6. CONCLUSION AND FUTURE WORK
In this paper, we proposed a vertical intent prediction architecture to improve the vertical selection
task. This approach outperforms the traditional vertical selection approaches by the fact that it considers
satisfying the user vertical intent automatically and without any feature engineering, unlike the others that
addressed this problem by training a machine-learned model based on the features that are supposed to detect
the vertical relevance to the query, which is very expensive and takes more time. Our experimental findings
show that our model can achieve acceptable accuracy. The research can be continued in future work in order
to increase accuracy. In addition, this work represents an advancement in the context of vertical selection,
which will renew the research in this field where there is no huge literature.
REFERENCES
[1] S. Mustajab, Mohd. Kashif Adhami, and R. Ali, “An Overview of Aggregating Vertical Results into Web Search
Results,” Int. J. Comput. Appl., vol. 69, no. 17, pp. 21-28, May 2013. Doi: 10.5120/12063-8107.
[2] Jaime Arguello, “Aggregated search,” Found. Trends Inf. Retr., vol. 10, no. 5, pp. 365-502, 2017.
[3] S. Achsas and E. H. Nfaoui, “An Analysis Study of Vertical Selection Task in Aggregated Search,” Procedia
Computer Science, Elsevier, vol. 148, pp. 171-180, Jan. 2019. Doi: 10.1016/j.procs.2019.01.021.
[4] X. Li, Y.-Y. Wang, and A. Acero, “Learning query intent from regularized click graphs,” in Proceedings of the 31st
annual international ACM SIGIR conference on Research and development in information retrieval-SIGIR ’08,
Singapore, Singapore, pp. 339, 2008. Doi: 10.1145/1390334.1390393.
[5] Fernando Diaz, “Integration of news content into web results,” in Proceedings of the Second ACM International
Conference on Web Search and Data Mining-WSDM ’09, Barcelona, Spain, pp. 182, 2009.
[6] G. Tsur, Y. Pinter, I. Szpektor, and D. Carmel, “Identifying Web Queries with Question Intent,” in Proceedings of
the 25th International Conference on World Wide Web-WWW ’16, Montréal, Québec, Canada,
pp. 783-793, 2016, doi: 10.1145/2872427.2883058.
[7] J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo, “Sources of evidence for vertical selection,” in Proceedings of
the 32nd international ACM SIGIR conference on Research and development in information retrieval-SIGIR ’09,
Boston, MA, USA, pp. 315, 2009. Doi: 10.1145/1571941.1571997.
13. Int J Elec & Comp Eng ISSN: 2088-8708
Vertical intent prediction approach based on Doc2vec and convolutional neural networks… (Sanae Achsas)
3881
[8] J. Arguello, F. Diaz, and J.-F. Paiement, “Vertical selection in the presence of unlabeled verticals,” in Proceeding
of the 33rd international ACM SIGIR conference on Research and development in information retrieval-SIGIR ’10,
Geneva, Switzerland, pp. 691, 2010. Doi: 10.1145/1835449.1835564.
[9] F. Diaz and J. Arguello, “Adaptation of offline vertical selection predictions in the presence of user feedback,”
in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information
retrieval-SIGIR ’09, Boston, MA, USA, pp. 323, 2009. Doi: 10.1145/1571941.1571998.
[10] D. Bakrola and S. Gandhi, “Enhancing Web Search Results Using Aggregated Search,” in Proceedings of
International Conference on ICT for Sustainable Development, S. C. Satapathy, A. Joshi, N. Modi, and N. Pathak,
Eds. Singapore: Springer Singapore, vol. 409, pp. 675-688, 2016.
[11] M. Shokouhi and L. Si, “Federated Search,” Found. Trends Inf. Retr., vol. 5, no. 1, pp. 1-102, 2011.
[12] J. P. Callan, Z. Lu, and W. B. Croft, “Searching Distributed Collections with Inference Networks,” pp. 8.
[13] R. Aly, D. Hiemstra, and T. Demeester, “Taily: shard selection using the tail of score distributions,” in Proceedings
of the 36th international ACM SIGIR conference on Research and development in information retrieval-SIGIR ’13,
Dublin, Ireland, pp. 673, 2013, doi: 10.1145/2484028.2484033.
[14] L. Si and J. Callan, “Relevant Document Distribution Estimation Method for Resource Selection,” pp. 8.
[15] P. Thomas and M. Shokouhi, “SUSHI: scoring scaled samples for server selection,” in Proceedings of the 32nd
international ACM SIGIR conference on Research and development in information retrieval-SIGIR ’09, Boston,
MA, USA, pp. 419, 2009, doi: 10.1145/1571941.1572014.
[16] J. Zamora, M. Mendoza, and H. Allende, “Query Intent Detection Based on Query Log Mining,” Journal of Web
Engineering, vol. 13, no. 1&2, pp. 30, 2014.
[17] D. Jiang and L. Yang, “Query intent inference via search engine log,” Knowl. Inf. Syst., vol. 49, no. 2, pp. 661-685,
Nov. 2016. Doi: 10.1007/s10115-015-0915-7.
[18] J. Gu, C. Feng, X. Gao, Y. Wang, and H. Huang, “Query Intent Detection Based on Clustering of Phrase
Embedding,” in Social Media Processing, Y. Li, G. Xiang, H. Lin, and M. Wang, Eds. Singapore: Springer
Singapore, vol. 669, pp. 110-122, 2016.
[19] J.-K. Kim, G. Tur, A. Celikyilmaz, B. Cao, and Y.-Y. Wang, “Intent detection using semantically enriched word
embeddings,” in 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, pp. 414-419, 2016.
Doi: 10.1109/SLT.2016.7846297.
[20] H. B. Hashemi, A. Asiaee, R. Kraft, “Query Intent Detection using Convolutional Neural Networks,” pp. 5, 2016.
[21] S. Makeev, A. Plakhov, and P. Serdyukov, “Personalizing Aggregated Search,” in Advances in Information
Retrieval, M. de Rijke, T. Kenter, A. P. de Vries, C. Zhai, F. de Jong, K. Radinsky, and K. Hofmann, Eds. Cham:
Springer International Publishing, vol. 8416, pp. 197-209, 2014.
[22] Kopliku, K. Pinel-Sauvagnat, and M. Boughanem, “Aggregated search: A new information retrieval paradigm,”
ACM Comput. Surv., vol. 46, no. 3, pp. 1-31, Jan. 2014. Doi: 10.1145/2523817.
[23] Mounia Lalmas, “Aggregated Search,” in Advanced Topics in Information Retrieval, M. Melucci and R. Baeza-Y.,
Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 33, pp. 109-123, 2011.
[24] S. Achsas and E. H. Nfaoui, “Intelligent layer for relational aggregated search based on deep neural networks,”
in 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir,
Morocco, pp. 1-2, 2016. Doi: 10.1109/AICCSA.2016.7945754.
[25] K. Zhou, R. Cummins, M. Halvey, M. Lalmas, and J. M. Jose, “Assessing and Predicting Vertical Intent for Web
Queries,” in Advances in Information Retrieval, R. Baeza-Yates, A. P. de Vries, H. Zaragoza, B. B. Cambazoglu,
V. Murdock, R. Lempel, and F. Silvestri, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 7224,
pp. 499-502, 2012.
[26] Q. V. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” ArXiv14054053 Cs, 2014.
[27] M. Campr and K. Ježek, “Comparing Semantic Models for Evaluating Automatic Document Summarization,”
in Text, Speech, and Dialogue, P. Král and V. Matoušek, Eds. Cham: Springer International Publishing, vol. 9302,
pp. 252-260, 2015.
[28] S. Lee, X. Jin, and W. Kim, “Sentiment classification for unlabeled dataset using Doc2Vec with JST,”
in Proceedings of the 18th Annual International Conference on Electronic Commerce e-Commerce in Smart
connected World-ICEC ’16, Suwon, Republic of Korea, pp. 1-5, 2016. Doi: 10.1145/2971603.2971631.
[29] K. Hashimoto, G. Kontonatsios, M. Miwa, and S. Ananiadou, “Topic detection using paragraph vectors to support
active learning in systematic reviews,” J. Biomed. Inform., vol. 62, pp. 59-65, Aug. 2016.
[30] H. Palangi, et al., “Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and
Application to Information Retrieval,” IEEEACM Trans. Audio Speech Lang. Process., vol. 24, no. 4, pp. 694-707,
Apr. 2016. Doi: 10.1109/TASLP.2016.2520371.
[31] S. Achsas and E. H. Nfaoui, “Improving relational aggregated search from big data sources using stacked
autoencoders,” Cognitive Systems Research, Elsevier, vol. 51, pp. 61-71, Oct. 2018.
[32] Q. Wu, P. Wang, C. Shen, A. Dick, and A. van den Hengel, “Ask Me Anything: Free-Form Visual Question
Answering Based on Knowledge from External Sources,” in 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 4622-4630, 2016. Doi: 10.1109/CVPR.2016.500.
[33] S. Achsas and E. H. Nfaoui, “Improving relational aggregated search from big data sources using deep learning,”
in 2017 Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, pp. 1-6, 2017.
[34] N. Ben-Lhachemi and E. H. Nfaoui, “Using Tweets Embeddings for Hashtag Recommendation in Twitter,”
Procedia Computer Science, ELSEVIER, vol. 127, pp. 7-15, 2018. Doi: 10.1016/j.procs.2018.01.092.
14. ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 10, No. 4, August 2020 : 3869 - 3882
3882
[35] F. Kalloubi, N. El Habib, and O. El Beqqali, “Graph based tweet entity linking using DBpedia,” in 2014 IEEE/ACS
11th International Conference on Computer Systems and Applications (AICCSA), Doha, pp. 501-506, 2014.
Doi: 10.1109/AICCSA.2014.7073240.
[36] Y. Chen, L. Xu, K. Liu, D. Zeng, and J. Zhao, “Event Extraction via Dynamic Multi-Pooling Convolutional Neural
Networks,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China,
pp. 167–176, 2015. Doi: 10.3115/v1/P15-1017.
[37] M. Denil, A. Demiraj, N. Kalchbrenner, P. Blunsom, and N. de Freitas, “Modelling, Visualising and Summarising
Documents with a Single Convolutional Neural Network,” ArXiv14063830 Cs Stat, Jun. 2014.
[38] F. Meng, Z. Lu, M. Wang, H. Li, W. Jiang, and Q. Liu, “Encoding Source Language with Convolutional Neural
Network for Machine Translation,” ArXiv150301838 Cs, Mar. 2015.
[39] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” ArXiv14085882 Cs, Aug. 2014.
[40] Severyn and A. Moschitti, “Modeling Relational Information in Question-Answer Pairs with Convolutional Neural
Networks,” ArXiv160401178 Cs, Apr. 2016.
[41] E. Grefenstette, P. Blunsom, N. de Freitas, and K. M. Hermann, “A Deep Architecture for Semantic Parsing,”
ArXiv14047296 Cs, Apr. 2014.
[42] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A Convolutional Neural Network for Modelling Sentences,”
ArXiv14042188 Cs, Apr. 2014.
[43] C. Zhu, X. Qiu, X. Chen, and X. Huang, “A Re-ranking Model for Dependency Parser with Recursive
Convolutional Neural Network,” ArXiv150505667 Cs, May 2015.
[44] T. Yamamoto, et al., “Overview of the NTCIR-12 IMine-2 Task,” Proceedings of the 12th NTCIR Conference on
Evaluation of Information Access Technologies, June 7-10, 2016 Tokyo Japan, pp. 19, 2016.
[45] T. Demeester, D. Trieschnigg, D. Nguyen, K. Zhou, and D. Hiemstra, “Overview of the TREC 2014 federated Web
search track,” Ghent Univ. (Belgium), 2014.
[46] B. Carterette, V. Pavlu, H. Fang, and E. Kanoulas, “Million Query Track 2009 Overview,” pp. 21, 2009.
BIOGRAPHIES OF AUTHORS
Sanae Achsas is a PhD student at the University of Sidi Mohammed Ben Abdellah in Fez,
Morocco. Her thesis is concerned with the improvement of the Aggregated Search with
the newest Machine Learning techniques such as Deep Learning. She has published
several papers in reputed journals and international conferences. She is a member of
the International Neural Network Society Morocco regional chapter. Sanae can be contacted
at sanae.achsas@usmba.ac.ma.
El Habib Nfaoui is currently a Professor of Computer Science at the University of Sidi
Mohamed Ben Abdellah, Fez, Morocco. He received his PhD in Computer Science from
the University of Sidi Mohamed Ben Abdellah, Morocco, and the University of Lyon,
France, under a Cotutelle agreement (doctorate in joint-supervision), in 2008, and then his
HU Diploma (accreditation to supervise research) in Computer Science, in 2013, from
the University of Sidi Mohamed Ben Abdellah. His current research interests include
Information Retrieval, Language Representation Learning, Web mining and Text mining,
Semantic Web, Web services, Social networks, Machine learning and Multi-Agent Systems.
Dr. El Habib Nfaoui has published in international reputed journals, books, and conferences,
and has edited six conference proceedings and special issue books. He has served as
a reviewer for scientific journals and as program committee of several conferences. He is
a co-founder and an executive member of the International Neural Network Society Morocco
regional chapter. He co-founded the International Conference on Intelligent Computing in
Data Sciences (ICSD2017) and the International Conference on Intelligent Systems and
Computer Vision (ISCV2015).