We the humans are surrounded with immense unprecedented wealth of information which are available as documents, database or other resources. The access to this information is difficult as by having the information it is not necessary that it could be searched or extracted by the activity we are using. The search engines available should be also customized to handle such queries, sometime the search engines are also not aware of the information they have within the system. The method known as keyword extraction and clustering is introduced which answers this shortcoming by spontaneously recommending documents that are related to usersâ current activities. When the communication takes place the important text can be extracted from the conversation and the words extracted are grouped and then are matched with the parts in the document. This method uses Natural Language Processing for extracting of keywords and making the subgroup that is a meaningful statement from the group, another method used is the Hierarchical Clustering for creating clusters form the keywords, here the similarity of two keywords is measured using the Euclidean distance. This paper reviews the various methods for the system.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Â
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
We the humans are surrounded with immense unprecedented wealth of information which are available as documents, database or other resources. The access to this information is difficult as by having the information it is not necessary that it could be searched or extracted by the activity we are using. The search engines available should be also customized to handle such queries, sometime the search engines are also not aware of the information they have within the system. The method known as keyword extraction and clustering is introduced which answers this shortcoming by spontaneously recommending documents that are related to usersâ current activities. When the communication takes place the important text can be extracted from the conversation and the words extracted are grouped and then are matched with the parts in the document. This method uses Natural Language Processing for extracting of keywords and making the subgroup that is a meaningful statement from the group, another method used is the Hierarchical Clustering for creating clusters form the keywords, here the similarity of two keywords is measured using the Euclidean distance. This paper reviews the various methods for the system.
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
Â
Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
A Federated Search Approach to Facilitate Systematic Literature Review in Sof...ijseajournal
Â
To impact industry, researchers developing technologies in academia need to provide tangible evidence of
the advantages of using them. Nowadays, Systematic Literature Review (SLR) has become a prominent
methodology in evidence-based researches. Although adopting SLR in software engineering does not go far
in practice, it has been resulted in valuable researches and is going to be more common. However, digital
libraries and scientific databases as the best research resources do not provide enough mechanism for
SLRs especially in software engineering. On the other hand, any loss of data may change the SLR results
and leads to research bias. Accordingly, the search process and evidence collection in SLR is a critical
point. This paper provides some tips to enhance the SLR process. The main contribution of this work is
presenting a federated search tool which provides an automatic integrated search mechanism in wellknown Software Engineering databases. Results of case study show that this approach not only reduces
required time to do SLR and facilitate its search process, but also improves its reliability and results in the
increasing trend to use SLRs.
This paper is addressed towards extraction of important words from conversations, with the
objective of utilizing these watchwords to recover, for every short audio fragment, a little number of
conceivably relatable reports, which can be prescribed to members, just-in-time. In any case, even a short
audio fragment contains a mixed bag of words, which are conceivably identified with a few topics; also,
utilizing automatic speech recognition (ASR) framework slips errors in the output. Along these lines, it is
hard to surmise correctly the data needs of the discussion members. We first propose a calculation to
remove decisive words from the yield of an ASR framework (or a manual transcript for testing) to
coordinate the potentially differing qualities of subjects and decrease ASR commotion. At that point, we
make use of a technique that to make many implicit queries from the selected keywords which will in
return produce list of relevant documents. The scores demonstrate that our proposition moves forward over
past systems that consider just word recurrence or theme closeness, and speaks to a promising answer for a
report recommender framework to be utilized as a part of discussions.
Building a recommendation system based on the job offers extracted from the w...IJECEIAES
Â
Recruitment, or job search, is increasingly used throughout the world by a large population of users through various channels, such as websites, platforms, and professional networks. Given the large volume of information related to job descriptions and user profiles, it is complicated to appropriately match a user's profile with a job description, and vice versa. The job search approach has drawbacks since the job seeker needs to search a job offers in each recruitment platform, manage their accounts, and apply for the relevant job vacancies, which wastes considerable time and effort. The contribution of this research work is the construction of a recommendation system based on the job offers extracted from the web and on the e-portfolios of job seekers. After the extraction of the data, natural language processing is applied to structured data and is ready for filtering and analysis. The proposed system is a content-based system, it measures the degree of correspondence between the attributes of the e-portfolio with those of each job offer of the same list of competence specialties using the Euclidean distance, the result is classified with a decreasing way to display the most relevant to the least relevant job offers
Co-Extracting Opinions from Online ReviewsEditor IJCATR
Â
Exclusion of opinion targets and words from online reviews is an important and challenging task in opinion mining. The
opinion mining is the use of natural language processing, text analysis and computational process to identify and recover the subjective
information in source materials. This paper propose a Supervised word alignment model, which identifying the opinion relation. Rather
than this paper focused on topical relation, in which to extract the relevant information or features only from a particular online reviews.
It is based on feature extraction algorithm to identify the potential features. Finally the items are ranked based on the frequency of
positive and negative reviews. Compared to previous methods, our model captures opinion relation and feature extraction more precisely.
One of the most advantages that our model obtain better precision because of supervised alignment model. In addition, an opinion
relation graph is used to refer the relationship between opinion targets and opinion words.
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
Â
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
Â
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
Â
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the userâs essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as NaĂŻve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
Â
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the userâs essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as NaĂŻve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
09 9241 it co-citation network investigation (edit ari)IAESIJEECS
Â
This paper reports a co-citation network investigation (bibliometric investigation) of 10 journals in the management of technology (MOT) field. And also presenting different bibliometric thoughts, organize investigation devices recognize and investigate the ideas secured by the field and their between connections. Particular outcomes from various levels of investigation demonstrate the diverse measurements of technology administration: Co-word terms recognize subjects, Journal co-reference arrange: connecting to different controls, Co-citation network indicate groupings of topics. The examination demonstrates that MOT has a connecting part in coordinating thoughts from a few particular orders. This recommends administration and technique are vital to MOT which basically identifies with the firm as opposed to arrangement. Additionally we have a double concentrate on abilities, however can see inconspicuous contrasts by the way we see these thoughts, either through an inwards looking focal point to perceive how associations capacity, or all the more outward to comprehend setting and change in landscapes.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Â
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as âWho is the first American to land in space ?â, or âwhat is the second Tallest Mountain in the world ?â, yet Todayâs Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
Research Report on Document Indexing-Nithish KumarNithish Kumar
Â
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
A Federated Search Approach to Facilitate Systematic Literature Review in Sof...ijseajournal
Â
To impact industry, researchers developing technologies in academia need to provide tangible evidence of
the advantages of using them. Nowadays, Systematic Literature Review (SLR) has become a prominent
methodology in evidence-based researches. Although adopting SLR in software engineering does not go far
in practice, it has been resulted in valuable researches and is going to be more common. However, digital
libraries and scientific databases as the best research resources do not provide enough mechanism for
SLRs especially in software engineering. On the other hand, any loss of data may change the SLR results
and leads to research bias. Accordingly, the search process and evidence collection in SLR is a critical
point. This paper provides some tips to enhance the SLR process. The main contribution of this work is
presenting a federated search tool which provides an automatic integrated search mechanism in wellknown Software Engineering databases. Results of case study show that this approach not only reduces
required time to do SLR and facilitate its search process, but also improves its reliability and results in the
increasing trend to use SLRs.
This paper is addressed towards extraction of important words from conversations, with the
objective of utilizing these watchwords to recover, for every short audio fragment, a little number of
conceivably relatable reports, which can be prescribed to members, just-in-time. In any case, even a short
audio fragment contains a mixed bag of words, which are conceivably identified with a few topics; also,
utilizing automatic speech recognition (ASR) framework slips errors in the output. Along these lines, it is
hard to surmise correctly the data needs of the discussion members. We first propose a calculation to
remove decisive words from the yield of an ASR framework (or a manual transcript for testing) to
coordinate the potentially differing qualities of subjects and decrease ASR commotion. At that point, we
make use of a technique that to make many implicit queries from the selected keywords which will in
return produce list of relevant documents. The scores demonstrate that our proposition moves forward over
past systems that consider just word recurrence or theme closeness, and speaks to a promising answer for a
report recommender framework to be utilized as a part of discussions.
Building a recommendation system based on the job offers extracted from the w...IJECEIAES
Â
Recruitment, or job search, is increasingly used throughout the world by a large population of users through various channels, such as websites, platforms, and professional networks. Given the large volume of information related to job descriptions and user profiles, it is complicated to appropriately match a user's profile with a job description, and vice versa. The job search approach has drawbacks since the job seeker needs to search a job offers in each recruitment platform, manage their accounts, and apply for the relevant job vacancies, which wastes considerable time and effort. The contribution of this research work is the construction of a recommendation system based on the job offers extracted from the web and on the e-portfolios of job seekers. After the extraction of the data, natural language processing is applied to structured data and is ready for filtering and analysis. The proposed system is a content-based system, it measures the degree of correspondence between the attributes of the e-portfolio with those of each job offer of the same list of competence specialties using the Euclidean distance, the result is classified with a decreasing way to display the most relevant to the least relevant job offers
Co-Extracting Opinions from Online ReviewsEditor IJCATR
Â
Exclusion of opinion targets and words from online reviews is an important and challenging task in opinion mining. The
opinion mining is the use of natural language processing, text analysis and computational process to identify and recover the subjective
information in source materials. This paper propose a Supervised word alignment model, which identifying the opinion relation. Rather
than this paper focused on topical relation, in which to extract the relevant information or features only from a particular online reviews.
It is based on feature extraction algorithm to identify the potential features. Finally the items are ranked based on the frequency of
positive and negative reviews. Compared to previous methods, our model captures opinion relation and feature extraction more precisely.
One of the most advantages that our model obtain better precision because of supervised alignment model. In addition, an opinion
relation graph is used to refer the relationship between opinion targets and opinion words.
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...University of Bari (Italy)
Â
The current abundance of electronic documents requires automatic techniques that support the users in understanding their content and extracting useful information. To this aim, improving the retrieval performance must necessarily go beyond simple lexical interpretation of the user queries, and pass through an understanding of their semantic content and aims. It goes without saying that any digital library would take enormous advantage from the availability of effective Information Retrieval techniques to provide to their users. This paper proposes an approach to Information Retrieval based on a correspondence of the domain of discourse between the query and the documents in the repository. Such an association is based on standard general-purpose linguistic resources (WordNet and WordNet Domains) and on a novel similarity assessment technique. Although the work is at a preliminary stage, interesting initial results suggest to go on extending and improving the approach.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
Â
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
Â
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the userâs essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as NaĂŻve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
Â
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the userâs essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as NaĂŻve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
09 9241 it co-citation network investigation (edit ari)IAESIJEECS
Â
This paper reports a co-citation network investigation (bibliometric investigation) of 10 journals in the management of technology (MOT) field. And also presenting different bibliometric thoughts, organize investigation devices recognize and investigate the ideas secured by the field and their between connections. Particular outcomes from various levels of investigation demonstrate the diverse measurements of technology administration: Co-word terms recognize subjects, Journal co-reference arrange: connecting to different controls, Co-citation network indicate groupings of topics. The examination demonstrates that MOT has a connecting part in coordinating thoughts from a few particular orders. This recommends administration and technique are vital to MOT which basically identifies with the firm as opposed to arrangement. Additionally we have a double concentrate on abilities, however can see inconspicuous contrasts by the way we see these thoughts, either through an inwards looking focal point to perceive how associations capacity, or all the more outward to comprehend setting and change in landscapes.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Â
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as âWho is the first American to land in space ?â, or âwhat is the second Tallest Mountain in the world ?â, yet Todayâs Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
Research Report on Document Indexing-Nithish KumarNithish Kumar
Â
Research on Document Indexing in the Search Engines. The main theme of Informational retrieval is to send the exact response of a user for specific Query.
The information search retrieval is a very big process, to achieve this concept we need to develop an application with more effect and we have to use techniques like Document indexing, page ranking, clustering technique. Among all of these Document index is plays avital role while searching why since instead of searching hundreds of thousands of documents it will directly go to the particular index and will give the output here. Here our achievement mainly is indexing, the clear meaning of the indexing is storing an index is to optimize speed and performance in finding the appropriate/corresponding document for the user searched query.
My conclusion is the context based index approach is used in the query retrieval, this is mainly from the source document. Instead of searching every page on server, finding technically is better. Due to this we can save our time, we can reduce the burden of server.
How to Create Map Views in the Odoo 17 ERPCeline George
Â
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as âdistorted thinkingâ.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
Â
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
Â
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Â
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Automatic Topics Identification For Reviewer Assignment
1. Automatic Topics Identification
For Reviewer Assignment
S. Ferilli, N. Di Mauro, T.M.A. Basile, F. Esposito, and M. Biba
Dipartimento di Informatica, University of Bari, Italy
{ferilli,ndm,basile,esposito,biba}@di.uniba.it
Abstract. Scientific conference management involves many complex and
multi-faceted activities, which would make highly desirable for the or-
ganizing people to have a Web-based management system that makes
some of them a little easier to carry out. One of such activities is the
assignment of submitted papers to suitable reviewers, involving the au-
thors, the reviewers and the conference chair. Authors that submit the
papers usually must fill a form with paper title, abstract and a set of con-
ference topics that fit their submission subject. Reviewers are required
to register and declare their expertise on the conference topics (among
other things). Finally, the conference chair has to carry out the review
assignment taking into account the information provided by both the au-
thors (about their paper) and the reviewers (about their competencies).
Thus, all this subtasks needed for the assignment are currently carried
out manually by the actors. While this can be just boring in the case of
authors and reviewers, in case of conference chair the task is also very
complex and time-consuming.
In this paper we propose the exploitation of intelligent techniques to
automatically extract paper topics from their title and abstract, and the
expertise of the reviewers from the titles of their publications available
on the Internet. Successively, such a knowledge is exploited by an expert
system able to automatically perform the assignments. The proposed
methods were evaluated on a real conference dataset obtaining good
results when compared to handmade ones, both in terms of quality and
user-satisfaction of the assignments, and for reduction in execution time
with respect to the case of humans performing the same process.
1 Introduction
Organizing scientific conferences is a complex and multi-faceted activity that
often requires the use of a Web-based management system to make some tasks
a little easier to carry out, such as the job of reviewing papers. Some of the
features typically provided by these packages are: submission of abstracts and
papers by Authors; submission of reviews by the Program Committee Members
(PCMs); download of papers by the Program Committee (PC); handling of re-
viewers preferences and bidding; Web-based assignment of papers to PCMs for
review; review progress tracking; Web-based PC meeting; notification of accep-
tance/rejection; sending e-mails for notifications. One of the hardest and most
2. time-consuming tasks in Scientific Conferences organization is the process of
assigning reviewers to submitted papers. Due to the many constraints to be ful-
filled, carrying out manually such a task is very tedious and difficult, and does
not guarantee to result in the best solution.
In the current practice, before the submission phase starts, the Chair usually
sets up a list of research topics of interest for the conference, and all reviewers are
asked to specify which of them correspond to their main areas of expertise. On
the other hand, during the submission process, authors are asked to explicitly
state which conference topics apply to their papers. Such an information pro-
vides a first guideline for associating reviewers to papers. One possible source
of problems, in the above procedure, lies in the topics selected by the authors
being sometimes misleading with respect to the real topic of the paper. For this
reason, in order to make the assignment more objective, it would be desirable to
automatically infer the paper topics rather than asking the authors to explicitly
provide such an information.
While the topics selected by (or inferred for) a reviewer refer to his back-
ground competencies, in some cases the reviewers could have specific preferences
about papers due to matter of taste or to other vague questions (e.g., the re-
viewer would like to review a paper just for curiosity; the abstract is imprecise or
misleading, etc.). For this reason, the bidding preferences approach is sometimes
preferred over the expertise one. We take into account both, but give priority
to the one based on the reviewer expertise, assuming that if a paper bid by a
reviewer does not match his topics of expertise, this should be considered as a
warning. To this concerns, a small pattern language has been defined in the liter-
ature that captures successful practice in several conference review processes [8].
In this work two patterns are followed, indicating that papers should be matched,
and assigned for evaluation, to reviewers who are competent in the specific pa-
per topics (ExpertsReviewPapers), and to reviewers who declared to be willing
to review those papers in the bidding phase (ChampionsReviewPapers).
This work aims at showing how this complex real-world domain can take
advantage of intelligent techniques for indexing and retrieving documents and
their associated topics. Specifically, it describes an intelligent component devel-
oped to be embedded in scientific Conference Management Systems that is able
to automatically:
⢠identify paper topics, among those of interest for the conference, by exploit-
ing the paper title and abstract;
⢠identify reviewers expertise, among the conference topics, by exploiting the
title of the reviewers publications available in the Internet;
⢠assign reviewers to papers submitted to a conference.
The identification of paper topics and reviewers expertise is performed start-
ing from the output of an automatic system for document analysis and then
exploiting NLP methods for automatically extracting significant topics. Then,
the assignment process is performed by an expert system that takes as input
this information. Thus, the methods that we propose aim at exploiting intelli-
3. gent techniques in the ExpertsReviewPapers pattern, that so far was applicable
only if some steps were manually performed by the users.
2 Reviewers Assignment: the General Framework
In order to perform the assignment, the Chair needs to know both the confer-
ence topics selected for each submitted paper and the topics that better describe
the reviewers expertise. In the following we show how it is possible to automati-
cally acquire such a knowledge by means of the Latent Semantic Indexing (LSI)
technique. As regards the papers, a system for the automatic processing of the
submitted documents will be presented. It will be exploited in order to automat-
ically extract the significant components, i.e. title and abstract, from the paper
without the author do it manually. As concern the reviewers, the information
needed from the application of the LSI was extracted from the online repository
of their publications (at the moment this task is carried out manually). Succes-
sively, an expert system that automatically performs the assignments based on
the extracted knowledge about the papers/reviwers topics will be presented.
2.1 Latent Semantic Indexing
A problem of most existing word-based retrieval systems consists of their in-
effectiveness in finding interesting documents when the users do not use the
same words by which the information they seek has been indexed. This is due
to a number of tricky features that are typical of natural language. One of the
most common concerns the fact that there are many ways (words) to express
a given concept (synonymy), and hence the terms in a userâs query might not
match those of a document even if it could be very interesting for him. Another
one is that many words have multiple meanings (polysemy), so that terms in a
userâs query will literally match terms in documents that are not semantically
interesting to the user.
The LSI technique [3] tries to overcome the weaknesses of term-matching
based retrieval by treating the unreliability of observed term-document associ-
ation data as a statistical problem. Indeed, LSI assumes that there exists some
underlying latent semantic structure in the data that is partially obscured by
the randomness of word choice with respect to the retrieval phase and that can
be estimated by means of statistical techniques. LSI relies on a mathematical
technique called Singular-Value Decomposition (SVD). Starting from a (large
and usually sparse) matrix of term-document association data, the SVD allows
to build and arrange a semantic space, where terms and documents that are
closely associated are placed near to each other, in such a way to reflect the
major associative patterns in the data, and ignore the smaller, less important
influences. As a result, terms that do not actually appear in a document may
still end up close to it, if this is consistent with the major association patterns in
the data. Position in the space thus serves as a new kind of semantic indexing,
and retrieval proceeds by using the terms in a query to identify a point in the
4. space, and returning to the user documents in its neighbourhood. It is possible to
specify a reduction parameter that intuitively represents the number of different
concepts to be taken into account, among which distributing the available terms
and documents.
The large amount of items that a document management system has to deal
with, and the continuous flow of new documents that could be added to the
initial database, require an incremental methodology to update the initial LSI
matrix. Indeed, applying from scratch at each update the LSI method, taking
into account both the old (already analysed) and the new documents, would
become computationally inefficient. Two techniques have been developed in the
literature to update (i.e., add new terms and/or documents to) an existing LSI
generated database: Folding-In [1] and SVD-Updating [9]. The former is a much
simpler alternative that uses the existing SVD to represent new information
but yields poor-quality updated matrices, since the information contained in the
new documents/terms is not exploited by the updated semantic space. The latter
represents a trade-off between the former and the recomputation from scratch.
2.2 The Document Management System
This section presents the current version of DOMINUS (DOcument Management
INtelligent Universal System) [5], a system for automated electronic documents
processing characterized by the intensive exploitation of intelligent techniques
in each step of the document management process: acquisition, layout analysis,
document image understanding, indexing, for categorization and information
retrieval purposes. It can deal with documents in standard formats, such as
PostScript (PS) or its evolution Portable Document Format (PDF).
The layout analysis process on documents in electronic format, sketched in
Figure 1, is now reported along with the steps performed by the system going
from the original PDF/PS document to the text extraction and indexing.
1. WINE: Rewrites basic PostScript operators to turn their drawing instructions
into objects. It takes as input a PDF/PS document and produces (by an in-
termediate vector format) the initial documentâs XML basic representation,
that describes it as a set of pages made up of basic blocks.
2. Rewriting rules: Identifies rewriting rules that could suggest how to set some
parameters in order to group together rectangles (words) to obtain lines.
Specifically, such a learning task was cast to a Multiple Instance Problem
(MIP) and solved by exploiting the kernel-based method proposed in [4].
3. DOC: Collects semantically related basic blocks into groups by identifying
frames that surround them based on whitespace and background structure
analysis. This is a variant of Breuelâs algorithm [2], that finds iteratively
the maximal white rectangles in a page. The modification consisted in a
bottom-up grouping of basic blocks into words and lines and in the empirical
identification of a stop criterion to end the process before finding insignificant
white spaces such as inter-word or inter-line ones.
5. Fig. 1. Document Management System
4. Layout Correction: At the end of the previous step it could be possible that
some blocks are not correctly recognized, i.e. background areas are considered
content ones and vice versa. In such a case a phase of layout correction
is needed, that is automatically performed in DOC by applying embedded
rules automatically learned for this task. To this purpose, we firstly collect
the manual corrections performed on some documents and describe them by
means of a first-order language representing both the situations before and
after the manual correction, then we exploit INTHELEX [6] (a first-order
logic learner) on this training set in order to identify correction rules.
5. Classification: Associates the document to a class that expresses its type (e.g.,
scientific/newspaper article, etc.). Since the logical structure is obviously dif-
ferent according to the kind of document, classification of the document is a
6. preliminary step before recognizing the relevant components for that docu-
ment (e.g., a sender is significant for a mail but non for a newspaper article).
INTHELEX is also exploited to learn rules for the automatic identification of
the class.
6. Understanding: Identifies the significant layout components for the class pre-
viously recognized and associates to each of them a tag that expresses its
role (e.g., title, author, abstract, etc.). Again, INTHELEX is exploited to
learn rules for the automatic identification of the logical components.
7. Extraction: Extracts the text from the significant components.
8. Indexing: Exploits the Latent Semantic Indexing technique to index the doc-
uments.
The following scenario can give an idea of how DOMINUS can be exploited in
the submission phase, and of what advantages it can bring to the involved people.
An Author connects to the Internet and (after registering, or after logging in if
already registered) opens the submission page, where he can browse his hard disk
and submit a paper by choosing the corresponding file in one of the accepted
formats. The paper is received and undergoes the various processing steps. The
layout analysis algorithm is applied, in order to single out its layout components.
Then, it is translated into a first-order logic description and classified by a proper
module according to the theory learned so far for the acceptable submission
layout standards (e.g., full paper, poster, demo). Depending on the identified
class, a further step exploits the same description to locate and label the layout
components of interest for that class (e.g., title, author, abstract and references
in a full paper). The text that makes up each of such components is read, stored
and used to automatically file the submission record (e.g., by filling its title,
authors and abstract fields).
If the system is unable to carry out any of these steps, such an event is noti-
fied to the Conference administrators, that can manually fix the problem and let
the system complete its task. Such manual corrections are logged and used by
the incremental learning component to refine the available classification/labeling
theories in order to improve their performance on future submissions. Neverthe-
less, this is done off-line, and the updated theory replaces the old one only after
the learning step has been successfully completed: this allows further submis-
sions to take place in the meantime, and makes the refinement step transparent
to the Authors. Alternatively, the fixes can be logged and exploited all at once to
refine the theory when its performance falls below a given threshold. Successively
a categorization of the paper content according to the text read is performed,
with the purpose of allowing to match the paper topics against the reviewersâ
expertise, in order to find the best associations for the final assignment. Specif-
ically, the text contained in the title and abstract is exploited, since we assume
they compactly summarize the subject and research field the paper is concerned
with, respectively.
7. 2.3 The Papers-Reviewers Assignment Phase
GRAPE (Global Review Assignment Processing Engine) [7], is an expert sys-
tem, written in CLIPS, for solving the reviewers assignment problem, that takes
advantage of both the papers content (topics) and the reviewers expertise and
preferences (biddings). It could be used by exploiting, in addition to the papers
topics, the reviewers expertise only, or both the reviewers expertise and biddings.
In the following a brief description of the system is given.
Let P = {p1, . . . , pn} denote the set of n papers submitted to the conference
C, regarding t topics (conference topics, TC), and R = {r1, . . . , rm} the set of m
reviewers. The goal is to assign the papers to reviewers, such that the following
basic constraints are fulfilled:
1. each paper is assigned to exactly k reviewers (usually, k is set to 3 or 4);
2. each reviewer should have roughly the same number of papers to review (the
mean number of reviews per reviewer is equal to nk/m);
3. papers should be reviewed by domain experts;
4. reviewers should revise articles based on their expertise and preferences.
As regards constraint 2, GRAPE can take as input additional constraints indicat-
ing that some specific reviewer r must review at most h paper. These constraints
override the general principle and must be taken into account for calculating the
mean number of reviews for the other reviewers.
Two measures were defined to guide the system during the search of the
best solutions: the reviewerâs gratification and the articleâs coverage. The former
represents the gratification degree of a reviewer, calculated on the basis of the
papers assigned to him. It is based on the confidence degree between the reviewer
and the assigned articles (the confidence degree between a paper pi concerning
topics Tpi and the reviewer rj expert in topics Trj is defined as the number
of topics in common) and on the number of assigned papers that were actually
bid by the reviewer. The articleâs coverage represents the coverage degree of
an article after the assignments. It is based on the confidence degree between
the article and the reviewers it was assigned to (the same as before), and the
expertise degree of the assigned reviewers (represented by the number of topics
in which they are expert, and computed for a reviewer rj as Trj/TC). GRAPE
tries to maximize both measures during the assignment process, in order to fulfil
the basic constraints 3 and 4. To reach this goal a fundamental requirement is
that each reviewer must provide at least one topic of preference, otherwise the
article coverage degree would be always null.
The assignment process is carried out in two phases. In the former, the system
progressively assigns reviewers to papers with the lowest number of candidate
reviewers. At the same time, the system prefers assigning papers to reviewers
with few assignments. In this way, it avoids to have reviewers with zero or few as-
signed papers. Hence, this phase can be viewed as a search for review assignments
by keeping low the average number of reviews per reviewer and maximizing the
coverage degree of the papers. In the latter phase, the remaining assignments
are chosen by considering first the confidence levels and then the expertise level
8. of the reviewers. In particular, given a paper pi which has not been assigned k
reviewers yet, the system tries to assign it to a reviewer rj with a high confidence
level between rj and pi. In case it is not possible, it assigns a reviewer with a
high level of expertise.
The assignments resulting from the base process are presented to each re-
viewer, that receives the list A of the h assigned papers, followed by the list Aâ˛
of the remaining ones, in order to actually issue his bidding. When all the re-
viewers have bid their papers, GRAPE searches for a new solution that takes into
account these biddings as well, in addition to the information about expertise.
In particular, it tries to change previous assignments in order to maximize both
articleâs coverage and reviewerâs gratification. By taking the articleâs coverage
high, the system tries to assign the same number of papers bid with the same
class to each reviewer. Then, the solution is presented to the reviewers as the
final one.
The main advantage of GRAPE relies in the fact that it is a rule-based system.
Hence, it is very easy to add new rules in order to change/improve its behavior,
and it is possible to describe background knowledge, such as further constraints
or conflicts, in a natural way. For example, one could insert a rule expressing
the preference to assign a reviewer to the articles in which he is cited, assuming
that he should be an expert in those fields.
3 Evaluation
The system was evaluated on a real-world dataset built by using data from the
18th Conference on Industrial & Engineering Applications of Artificial Intelli-
gence & Expert Systems (IEA/AIE 2005), whose Call for Papers identified 34
topics of interest. The papers submitted were 264 and the reviewers 60. Since
the objective was to assess the performance of the automatic topic recognition
methodology, in this case only the reviewersâ expertise, and not their bidding
were exploited by the paper assignment system.
The following steps were carried out. Firstly, the layout of each paper was
automatically analyzed in order to recognize the significant components. In par-
ticular, the abstract and title were considered the most representative of the
document subject, and hence the corresponding text was read. The words con-
tained therein were stemmed according to the technique proposed by Porter [10],
resulting in a total of 2832 word stems, on which the LSI technique was applied
in order to index the whole set of documents. Then, the same procedure was
applied to index the reviewers, resulting in 2204 stems. In this case, the titles
of their papers appearing in the DBLP Computer Science Bibliography reposi-
tory (http://www.informatik.uni-trier.de/âźley/db/) were exploited. With respect
to exploiting their homepagesâ information on research interests, this ensured
a more uniform description. Compared to manually selecting the title of their
publications, this ensured more completeness, even if at the cost of not having
the abstracts available as well.
9. In both cases, the LSI parameters were set in such a way that all the con-
ference topics were covered as different concepts. The experiment consisted in
performing 34 queries, each corresponding to one conference topic, on both the
papers and the reviewers in the database previously indexed, and then in associ-
ating to each paper/reviewer the topics for which it/he appears among the first
l results. Specifically, the results on document topic recognition showed that 88
documents per query had to be considered, in order to include the whole set
of documents. However, returning just 30 documents per query, 257 out of 264
documents (97.3%) were already assigned to at least one topic, which is an ac-
ceptable trade-off (the remaining 7 documents can be easily assigned by hand).
Thus, 30 documents were considered a good parameter, and exploited to count
the distribution of the topics between the documents. Interestingly, more than
half of the documents (54.7%) concern between 2 and 4 topics, which could be
expected both for the current interest of the researchers in mixing together dif-
ferent research areas and for the nature of the topics, that are not completely
disjoint (some are specializations of others). Evaluated by the conference orga-
nizers, the result showed a 79% accuracy on average. As to the reviewers, even
if taking l = 6 already ensured at least one topic for each of them, we adopted
a more cautious approach and took l = 10, in order to balance the possible
inaccuracy due to considering only the titles of their publications. The resulting
accuracy was 65%.
Lastly, the topics automatically associated to papers and reviewers were fed
to GRAPE in order to perform the associations, with the requirement to assign
each paper to 2 reviewers. In order to have an insight on the quality of the re-
sults, in the following we present some interesting figures concerning GRAPEâs
outcome. In solving the problem, the system was able to complete its task in 120
seconds. GRAPE was always able to assign papers to reviewers by considering the
topics only, except in two cases. In particular, except for those reviewers that
explicitly asked to review less than 10 papers (MaxReviewsPerReviewer con-
straint), it assigned 10 papers to 40 reviewers, 9 to 2 reviewers, 8 to 3 reviewers,
7 and 6 to one reviewer. The experts considered the final associations made by
GRAPE very helpful, since they would have changed just 7% of them.
4 Conclusions and Future Works
This paper proposed the application of intelligent techniques as a support to
the various phases required for making automatic the task of paper-reviewer
assignment in a scientific conference management. Experiments on a real domain
prove the viability of the proposed approach.
Different future work directions are planned for the proposed system. First,
the conference management system will be extended to cover other knowledge-
intensive tasks currently in charge of the organizers, such as final presentations
partition and scheduling according to the paper subject. Second, the automatic
processing of the bibliographic references of the papers and of the publications of
the reviewers will be faced. Furthermore, we plan to process the reviewers home
10. page to discover all the information needed for their registrations in order to
automatically fill in all the fields in the registration form (i.e., affiliation, research
interests, etc.). Then, in a more general perspective, the proposed techniques
will be applied to the problem of matching the documents in a digital library to
the interests of the library users. The use of ontologies for improving matching
effectiveness will be investigated as well.
References
1. Michael W. Berry, Susan T. Dumais, and Gavin W. OâBrien. Using linear algebra
for intelligent information retrieval. SIAM Rev., 37(4):573â595, 1995.
2. Thomas M. Breuel. Two geometric algorithms for layout analysis. In Workshop
on Document Analysis Systems, 2002.
3. Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas,
and Richard A. Harshman. Indexing by latent semantic analysis. Journal of the
American Society of Information Science, 41(6):391â407, 1990.
4. Thomas G. Dietterich, Richard H. Lathrop, and Tomas Lozano-Perez. Solving
the multiple instance problem with axis-parallel rectangles. Artificial Intelligence,
89(1-2):31â71, 1997.
5. Floriana Esposito, Stefano Ferilli, Teresa Maria Altomare Basile, and Nicola Di
Mauro. Semantic-based access to digital document databases. In Foundations of
Intelligent Systems, 15th International Symposium (ISMIS 2005), volume 3488 of
Lecture Notes in Computer Science, pages 373â381. Springer Verlag, 2005.
6. Floriana Esposito, Stefano Ferilli, Nicola Fanizzi, Teresa M.A. Basile, and Nicola
Di Mauro. Incremental multistrategy learning for document processing. Ap-
plied Artificial Intelligence: An Internationa Journal, 17(8/9):859â883, September-
October 2003.
7. Nicola Di Mauro, Teresa Maria Altomare Basile, and Stefano Ferilli. Grape: An
expert review assignment component for scientific conference management systems.
In Innovations in Applied Artificial Intelligence: 18th International Conference
on Industrial and Engineering Applications of Artificial Intelligence and Expert
Systems (IEA/AIE 2005), volume 3533 of Lecture Notes in Computer Science,
pages 789â798. Springer Verlag, 2005.
8. Oscar Nierstrasz. Identify the champion. In N. Harrison, B. Foote, and H. Rohnert,
editors, Pattern Languages of Program Design, volume 4, pages 539â556. Addison
Wesley, 2000.
9. Gavin W. OâBrien. Information management tools for updating an SVD-encoded
indexing scheme. Technical Report UT-CS-94-258, University of Tennessee, 1994.
10. Martin F. Porter. An algorithm for suffix stripping. In J. S. Karen and P. Wil-
let, editors, Readings in information retrieval, pages 313â316. Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 1997.