This document summarizes a research paper that evaluated the impact of removing less important terms on sentiment analysis. It discusses how sentiment analysis is an important natural language processing task but is complex due to linguistic challenges. Supervised machine learning is commonly used but requires high-quality training data. The paper experiments with identifying and removing less important words like stopwords and supporting part-of-speech tags from training data to see if it improves the precision of a sentiment classification model. The results showed removal of some unimportant words improved precision on a large generic dataset but not a smaller context-specific one.
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEIAEME Publication
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to detect fake news and try to make people aware of the truth. This paper gives an insight into how to detect fake news using Machine Learning and Deep Learning Techniques. On observing our data, we have categorized our data into five attributes namely Title, Text, Subject, Date, and Labels. In order to develop an efficient fake news detection system, the feature along with its degree of impact on the system must be taken into consideration. This paper attempts at providing a detailed analysis of detecting fake news using various models such as LSTM, ANN, Naïve Bayes, SVM, Logistic Regression, XGBoost, and Bert.
For this project, we had to conduct research on a topic that was seen as a relevant area of study in Enterprise Systems and how it will be applicable in the future.
We chose to study the effects artificial intelligence will have on CRM systems. To view our findings, you can view the video here - https://www.youtube.com/watch?v=Fe55c60QPwY&t=9s
A Study on the Applications and Impact of Artificial Intelligence in E Commer...ijtsrd
Trends in computer science show that various aspects of Artificial Intelligence are emerging, and other trends show that these advances are being applied to create intelligent in formation systems. In recent days artificial intelligence is changing the ways in which computers are usable as problem solving tools. The talent of humans is thus smartly creating and operating tools are indeed a feature of human based brainpower. This technology is now adapted by various E Commerce websites in order to identify the customer preference, pervious purchases, frequent checks etc. Google and Microsoft are also investing in artificial intelligence through various forms in order to enhance better customer service. The main aim of the study is to analysis and explores the various applications and impact of artificial intelligence in E Commerce industry. This study analyses and concludes that by replacement of human expert with artificial intelligence systems in E Commerce industry can significantly speedup and cheapens the production or service process. Prof. Lakshmi Narayan. N | Naveena. N "A Study on the Applications and Impact of Artificial Intelligence in E-Commerce Industry" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26374.pdfPaper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/26374/a-study-on-the-applications-and-impact-of-artificial-intelligence-in-e-commerce-industry/prof-lakshmi-narayan-n
ANALYZING AND IDENTIFYING FAKE NEWS USING ARTIFICIAL INTELLIGENCEIAEME Publication
The main reason behind the spread of fake news is because of many fake and hyperpartisan sites present on the Internet. These fake sites try to manipulate the truth which creates misunderstanding in society. Therefore, it is important to detect fake news and try to make people aware of the truth. This paper gives an insight into how to detect fake news using Machine Learning and Deep Learning Techniques. On observing our data, we have categorized our data into five attributes namely Title, Text, Subject, Date, and Labels. In order to develop an efficient fake news detection system, the feature along with its degree of impact on the system must be taken into consideration. This paper attempts at providing a detailed analysis of detecting fake news using various models such as LSTM, ANN, Naïve Bayes, SVM, Logistic Regression, XGBoost, and Bert.
For this project, we had to conduct research on a topic that was seen as a relevant area of study in Enterprise Systems and how it will be applicable in the future.
We chose to study the effects artificial intelligence will have on CRM systems. To view our findings, you can view the video here - https://www.youtube.com/watch?v=Fe55c60QPwY&t=9s
A Study on the Applications and Impact of Artificial Intelligence in E Commer...ijtsrd
Trends in computer science show that various aspects of Artificial Intelligence are emerging, and other trends show that these advances are being applied to create intelligent in formation systems. In recent days artificial intelligence is changing the ways in which computers are usable as problem solving tools. The talent of humans is thus smartly creating and operating tools are indeed a feature of human based brainpower. This technology is now adapted by various E Commerce websites in order to identify the customer preference, pervious purchases, frequent checks etc. Google and Microsoft are also investing in artificial intelligence through various forms in order to enhance better customer service. The main aim of the study is to analysis and explores the various applications and impact of artificial intelligence in E Commerce industry. This study analyses and concludes that by replacement of human expert with artificial intelligence systems in E Commerce industry can significantly speedup and cheapens the production or service process. Prof. Lakshmi Narayan. N | Naveena. N "A Study on the Applications and Impact of Artificial Intelligence in E-Commerce Industry" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26374.pdfPaper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/26374/a-study-on-the-applications-and-impact-of-artificial-intelligence-in-e-commerce-industry/prof-lakshmi-narayan-n
An overview of information extraction techniques for legal document analysis ...IJECEIAES
In an Indian law system, different courts publish their legal proceedings every month for future reference of legal experts and common people. Extensive manual labor and time are required to analyze and process the information stored in these lengthy complex legal documents. Automatic legal document processing is the solution to overcome drawbacks of manual processing and will be very helpful to the common man for a better understanding of a legal domain. In this paper, we are exploring the recent advances in the field of legal text processing and provide a comparative analysis of approaches used for it. In this work, we have divided the approaches into three classes NLP based, deep learning-based and, KBP based approaches. We have put special emphasis on the KBP approach as we strongly believe that this approach can handle the complexities of the legal domain well. We finally discuss some of the possible future research directions for legal document analysis and processing.
There are essential security considerations in the systems used by semiconductor companies like TI. Along
with other semiconductor companies, TI has recognized that IT security is highly crucial during web
application developers' system development life cycle (SDLC). The challenges faced by TI web developers
were consolidated via questionnaires starting with how risk management and secure coding can be
reinforced in SDLC; and how to achieve IT Security, PM and SDLC initiatives by developing a prototype
which was evaluated considering the aforementioned goals. This study aimed to practice NIST strategies
by integrating risk management checkpoints in the SDLC; enforce secure coding using static code analysis
tool by developing a prototype application mapped with IT Security goals, project management and SDLC
initiatives and evaluation of the impact of the proposed solution. This paper discussed how SecureTI was
able to satisfy IT Security requirements in the SDLC and PM phases.
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...IJMIT JOURNAL
This research explores the challenges in management and the root cause for complex application portfolios
in the public sector. It takes Australian public sector organisations with the case of South Australia Police
(SAPOL) for evaluation it being one of the significant and mission critical state government agencies. The
exploratory research surfaces some of the key challenges using interview as primary data collection
source, along with archive records, documentation, and direct observation as secondary sources. This
paper reports on the information analysed surfacing eight key issues. It highlights that the organic growth
of the technology portfolios, with mission criticality has resulted in many quick fixes which are not aligned
with long term enterprise architectural stability. Integration of different mismatched technologies, along
with the pressure from the business to always keep the lights on, does not provide the opportunity for the
portfolios to be rationalised in an ongoing way. Other issues and the areas for further study are explored
at the end.
This proposed system will help in consulting the career opportunities to the students after 10th, 12th or graduation for their bright future and will show the recent industrial trends in that particular profession. In this system we will be working on real time web-based application which will provide students forum for discussion, real time job updates from industry, different industrial events nearby places, live chat with the professional experts. User can apply for the jobs. Database management, real time system and web-based languages will be used design this application. This proposed system will provide the direct communication platform for students with the industry. This system will help the students or employees to build the professional career, resume according to the format approved by industry. User can update and share their documents and experiences with the industry. This system will provide automated verification system with the help of network security. Priyanka Bodke | Nikita Kale | Sneha Jha | Vaishnavi Joshi"Real Time Application for Career Guidance" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd11525.pdf http://www.ijtsrd.com/engineering/computer-engineering/11525/real-time-application-for-career-guidance/priyanka-bodke
CONNECTING O*NET® DATABASE TO CYBERSECURITY WORKFORCE PROFESSIONAL CERTIFICAT...IJITE
The Occupational Information Network O*NET is considered the primary source of occupational
information in the U.S. I explore here possible uses of O*NET data to inform cybersecurity workforce
readiness certification programs. The O*NET database is used to map out education requirements and how
they relate to professional certifications as required by employers and job designers in accordance with the
National Initiative for Cybersecurity Careers and Studies (NICCS). The search focuses on the “Information
Security Analysts” occupation as listed on O*NET, Careeronestop, U.S. Bureau of Labor Statistics (BLS),
and finally tied back to NICCS source work role to identify certifications requirements. I found that no site
has listed any certification as required, desirable or mandatory. NICCS offered general guidance to potential
topics and areas of certification. Careeronestop site provided the ultimate guidance for this role certification.
Professional certifications are still not integrated in the Cybersecurity Workforce Framework official
guidance.
A Model for Encryption of a Text Phrase using Genetic Algorithmijtsrd
"In any organization it is an essential task to protect the data from unauthorized users. Information Systems hardware, software, networks, and data resources need to be protected and secured to ensure quality, performance, and integrity. Security management deals with the accuracy, integrity, and safety of information resources. When effective security measures are in place, they can reduce errors, fraud, and losses. In the current work, the authors have proposed a model for encryption of a text phrase employing genetic algorithm. The entropy inherently available in genetic algorithm is exploited for introducing chaos in a text phrase thereby rendering it unreadable. The no of cross over points and mutation points decides the strength of the algorithm. The prototype of the model is implemented for testing the operational feasibility of the model and the few test cases are presented Dr. Poornima G. Naik | Mr. Pandurang M. More | Dr. Girish R. Naik ""A Model for Encryption of a Text Phrase using Genetic Algorithm"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23063.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-processing/23063/a-model-for-encryption-of-a-text-phrase-using-genetic-algorithm/dr-poornima-g-naik"
Impact of Expert System as Tools for Efficient Teaching and Learning Process ...rahulmonikasharma
Introducing an expert system as tool in the teaching and learning process in the Nigeria educational system is a much needed step to improving the process, this is because it is filled with a few challenges involved. The advent of computer system has definitely opened way to Computer Aided Instruction (CAI) for which an expert system is one. An expert system is a well known area of artificial intelligence which is a computerized tool designed to enhance the quality and availability of knowledge required in educational system. The general society sees CAI/expert system as inevitable and a must in teaching and learning process. Borrowing a leaf from the civilized world in their knowledge preservation and distribution, it now becomes necessary for the Nigeria educational system to adopt CAI, and especially expert to duplicate the rare knowledge and experience of a few experts in different fields of education and to place the Nigerian educational system at par with their international counterparts. Though expert system has enormous benefits, they remain un-established as a useful technology due to few research and documentation. This research work proposes that the effective introduction of expert system in teaching and learning process in Nigerian educational system should be adopted as its advantages over traditional chalk-talk method is innumerable.
The project is to ask college related queries and get the responses through a chatbot an Artificial Conversational Entity. This System is a web application which provides answer to the query of the student. Students just have to query through the bot which is used for chatting. Students can chat using any format there is no specific format the user has to follow. This system helps the student to be updated about the college activities.
Abstract. With chatbots gaining traction and their adoption growing in different verticals, e.g. Health, Banking, Dating; and users sharing more and more private information with chatbots — studies have started to highlight the privacy risks of chatbots. In this paper, we propose two privacy-preserving approaches for chatbot conversations. The first approach applies ‘entity’ based privacy filtering and transformation, and can be applied directly on the app (client) side. It however requires knowledge of the chatbot design to be enabled. We present a second scheme based on Searchable Encryption that is able to preserve user chat privacy, without requiring any knowledge of the chatbot design. Finally, we present some experimental results based on a real-life employee Help Desk chatbot that validates both the need and feasibility of the proposed approaches.
Smart information desk system with voice assistant for universities IJECEIAES
This article aims to develop a smart information desk system through a smart mirror for universities. It is a mirror with extra capabilities of displaying answers for academic inquiries such as asking about the lecturers’ office numbers and hours, exams dates and times on the mirror surface. In addition, the voice recognition feature was used to answer spoken inquiries in audio responds to serve all types of users including disabled ones. Furthermore, the system showed general information such as date, weather, time and the university map. The smart mirror was connected to an outdoor camera to monitor the traffics at the university entrance gate. The system was implemented on a Raspberry Pi 4 model B connected to a two-way mirror and an infrared (IR) touch frame. The results of this study helped to overcome the problem of the information desk absence in the university. Therefore, it helped users to save their time and effort in making requests for important academic information.
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS ijaia
Recruitment in the IT sector has been on the rise in recent times. Software companies are on the hunt to recruit raw talent right from the colleges through job fairs. The process of allotment of projects to the new recruits is a manual affair, usually carried out by the Human Resources department of the organization. This process of project allotment to the new recruits is a costly affair for the organization as it relies mostly on human effort. In the recent times, software companies round the world are leveraging the advances in machine learning and Artificial intelligence in general to automate routine tasks in the enterprise to increase the productivity. In the paper, we discuss the design and implementation of a resume classifier application which employs an ensemble learning based voting classifier to classify a profile of a candidate into a suitable domain based on his interest, work-experience and expertise mentioned by the candidate in the profile. The model employs topic modelling techniques to introduce a new domain to the list of domains upon failing to achieve the threshold value of confidence for the classification of the candidate profile. The Stack-Overflow REST APIs are called for the profiles which fail on the confidence threshold test set in the application. The topics returned by the APIs are subjected to topic modelling to obtain a new domain, on which the voting classifier is retrained after a fixed interval to improve the accuracy of the model.Overall, emphasis is laid out on building a dynamic machine learning automation tool which is not solely dependent on the training data in allotment of projects to the new recruits. We extended our previous work withnew learning model that has the ability to classify the resumes with better accuracy and support more new domains.
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTSijcsit
Search for answers in specific domains is a new milestone in question answering. Traditionally, question answering has focused on general domain questions. Thus, the most relevant answers (or passages) are selected according to the type of question and the Named Entities included in the possible answers. In this paper, we present a novel approach on question answering over specific (or technical) domains. This proposal allows us to answer questions such as “What article is appropriate for … “, “What are the articles related to … “, these kind of questions cannot be answered by a general question answering system. Our approach is based on a set of laws of a specific domain, which contain a large set of laws regarding the work organized into a hierarchy. We consider generic concepts such as “article” semantic categories. Our results on the corpus of Federal Labor Law show that this approach is effective and highly reliable.
Search for answers in specific domains is a new milestone in question answering. Traditionally, question answering has focused on general domain questions. Thus, the most relevant answers (or passages) are selected according to the type of question and the Named Entities included in the possible answers. In this paper, we present a novel approach on question answering over specific (or technical) domains. This proposal allows us to answer questions such as “What article is appropriate for … “, “What are the articles related to … “, these kind of questions cannot be answered by a general question answering system. Our approach is based on a set of laws of a specific domain, which contain a large set of laws regarding the work organized into a hierarchy. We consider generic concepts such as “article” semantic categories. Our results on the corpus of Federal Labor Law show that this approach is effective and highly reliable.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
A DEVELOPMENT FRAMEWORK FOR A CONVERSATIONAL AGENT TO EXPLORE MACHINE LEARNIN...mlaij
This study aims to introduce a discussion platform and curriculum designed to help people understand how
machines learn. Research shows how to train an agent through dialogue and understand how information
is represented using visualization. This paper starts by providing a comprehensive definition of AI literacy
based on existing research and integrates a wide range of different subject documents into a set of key AI
literacy skills to develop a user-centered AI. This functionality and structural considerations are organized
into a conceptual framework based on the literature. Contributions to this paper can be used to initiate
discussion and guide future research on AI learning within the computer science community.
NLP (Natural Language Processing) is a mechanism that helps computers to know natural languages like English. In general, computers can understand data, tables etc. which are well formed. But when it involves natural languages, it's unacceptable for computers to spot them. NLP helps to translate the tongue in such a fashion which will be easily processed by modern computers. Financial Tracker is an approach which will use NLP as a tool and can differentiate the user messages in various categories. the appliance of the approach will be seen at multiple levels. At a personal level, this permits users to filtrate useful financial messages from an large junk of text messages. On the opposite hand, from an industrial point of view, this can be useful in services like online loan disbursal, which are hitting the market nowadays. These services attempt to provide online loans to individuals in an exceedingly faster and quicker manner. But when it involves business view, loan recovery from customers becomes a really important & crucial aspect. As most such services can’t take strict legal actions against the fraud customers, it becomes a requirement that loan should be provided only to those customers who deserve it. At that time, this model can come under the image. As a business we will find the user’s messages from their inbox (after taking permission from the users). These messages are often filtered using NLP which might help to differentiate various types of messages within the user's inbox which might further be used as a content for further prediction and analysis on user’s behaviour in terms of cash related transactions.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
TOP CITED ARTICLES - The International Journal of Multimedia & Its Applicatio...ijma
The International Journal of Multimedia & Its Applications (IJMA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Multimedia & its applications. The journal focuses on all technical and practical aspects of Multimedia and its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding recent developments this arena, and establishing new collaborations in these areas.
An overview of information extraction techniques for legal document analysis ...IJECEIAES
In an Indian law system, different courts publish their legal proceedings every month for future reference of legal experts and common people. Extensive manual labor and time are required to analyze and process the information stored in these lengthy complex legal documents. Automatic legal document processing is the solution to overcome drawbacks of manual processing and will be very helpful to the common man for a better understanding of a legal domain. In this paper, we are exploring the recent advances in the field of legal text processing and provide a comparative analysis of approaches used for it. In this work, we have divided the approaches into three classes NLP based, deep learning-based and, KBP based approaches. We have put special emphasis on the KBP approach as we strongly believe that this approach can handle the complexities of the legal domain well. We finally discuss some of the possible future research directions for legal document analysis and processing.
There are essential security considerations in the systems used by semiconductor companies like TI. Along
with other semiconductor companies, TI has recognized that IT security is highly crucial during web
application developers' system development life cycle (SDLC). The challenges faced by TI web developers
were consolidated via questionnaires starting with how risk management and secure coding can be
reinforced in SDLC; and how to achieve IT Security, PM and SDLC initiatives by developing a prototype
which was evaluated considering the aforementioned goals. This study aimed to practice NIST strategies
by integrating risk management checkpoints in the SDLC; enforce secure coding using static code analysis
tool by developing a prototype application mapped with IT Security goals, project management and SDLC
initiatives and evaluation of the impact of the proposed solution. This paper discussed how SecureTI was
able to satisfy IT Security requirements in the SDLC and PM phases.
CHALLENGES FOR MANAGING COMPLEX APPLICATION PORTFOLIOS: A CASE STUDY OF SOUTH...IJMIT JOURNAL
This research explores the challenges in management and the root cause for complex application portfolios
in the public sector. It takes Australian public sector organisations with the case of South Australia Police
(SAPOL) for evaluation it being one of the significant and mission critical state government agencies. The
exploratory research surfaces some of the key challenges using interview as primary data collection
source, along with archive records, documentation, and direct observation as secondary sources. This
paper reports on the information analysed surfacing eight key issues. It highlights that the organic growth
of the technology portfolios, with mission criticality has resulted in many quick fixes which are not aligned
with long term enterprise architectural stability. Integration of different mismatched technologies, along
with the pressure from the business to always keep the lights on, does not provide the opportunity for the
portfolios to be rationalised in an ongoing way. Other issues and the areas for further study are explored
at the end.
This proposed system will help in consulting the career opportunities to the students after 10th, 12th or graduation for their bright future and will show the recent industrial trends in that particular profession. In this system we will be working on real time web-based application which will provide students forum for discussion, real time job updates from industry, different industrial events nearby places, live chat with the professional experts. User can apply for the jobs. Database management, real time system and web-based languages will be used design this application. This proposed system will provide the direct communication platform for students with the industry. This system will help the students or employees to build the professional career, resume according to the format approved by industry. User can update and share their documents and experiences with the industry. This system will provide automated verification system with the help of network security. Priyanka Bodke | Nikita Kale | Sneha Jha | Vaishnavi Joshi"Real Time Application for Career Guidance" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-3 , April 2018, URL: http://www.ijtsrd.com/papers/ijtsrd11525.pdf http://www.ijtsrd.com/engineering/computer-engineering/11525/real-time-application-for-career-guidance/priyanka-bodke
CONNECTING O*NET® DATABASE TO CYBERSECURITY WORKFORCE PROFESSIONAL CERTIFICAT...IJITE
The Occupational Information Network O*NET is considered the primary source of occupational
information in the U.S. I explore here possible uses of O*NET data to inform cybersecurity workforce
readiness certification programs. The O*NET database is used to map out education requirements and how
they relate to professional certifications as required by employers and job designers in accordance with the
National Initiative for Cybersecurity Careers and Studies (NICCS). The search focuses on the “Information
Security Analysts” occupation as listed on O*NET, Careeronestop, U.S. Bureau of Labor Statistics (BLS),
and finally tied back to NICCS source work role to identify certifications requirements. I found that no site
has listed any certification as required, desirable or mandatory. NICCS offered general guidance to potential
topics and areas of certification. Careeronestop site provided the ultimate guidance for this role certification.
Professional certifications are still not integrated in the Cybersecurity Workforce Framework official
guidance.
A Model for Encryption of a Text Phrase using Genetic Algorithmijtsrd
"In any organization it is an essential task to protect the data from unauthorized users. Information Systems hardware, software, networks, and data resources need to be protected and secured to ensure quality, performance, and integrity. Security management deals with the accuracy, integrity, and safety of information resources. When effective security measures are in place, they can reduce errors, fraud, and losses. In the current work, the authors have proposed a model for encryption of a text phrase employing genetic algorithm. The entropy inherently available in genetic algorithm is exploited for introducing chaos in a text phrase thereby rendering it unreadable. The no of cross over points and mutation points decides the strength of the algorithm. The prototype of the model is implemented for testing the operational feasibility of the model and the few test cases are presented Dr. Poornima G. Naik | Mr. Pandurang M. More | Dr. Girish R. Naik ""A Model for Encryption of a Text Phrase using Genetic Algorithm"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Special Issue | Fostering Innovation, Integration and Inclusion Through Interdisciplinary Practices in Management , March 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23063.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-processing/23063/a-model-for-encryption-of-a-text-phrase-using-genetic-algorithm/dr-poornima-g-naik"
Impact of Expert System as Tools for Efficient Teaching and Learning Process ...rahulmonikasharma
Introducing an expert system as tool in the teaching and learning process in the Nigeria educational system is a much needed step to improving the process, this is because it is filled with a few challenges involved. The advent of computer system has definitely opened way to Computer Aided Instruction (CAI) for which an expert system is one. An expert system is a well known area of artificial intelligence which is a computerized tool designed to enhance the quality and availability of knowledge required in educational system. The general society sees CAI/expert system as inevitable and a must in teaching and learning process. Borrowing a leaf from the civilized world in their knowledge preservation and distribution, it now becomes necessary for the Nigeria educational system to adopt CAI, and especially expert to duplicate the rare knowledge and experience of a few experts in different fields of education and to place the Nigerian educational system at par with their international counterparts. Though expert system has enormous benefits, they remain un-established as a useful technology due to few research and documentation. This research work proposes that the effective introduction of expert system in teaching and learning process in Nigerian educational system should be adopted as its advantages over traditional chalk-talk method is innumerable.
The project is to ask college related queries and get the responses through a chatbot an Artificial Conversational Entity. This System is a web application which provides answer to the query of the student. Students just have to query through the bot which is used for chatting. Students can chat using any format there is no specific format the user has to follow. This system helps the student to be updated about the college activities.
Abstract. With chatbots gaining traction and their adoption growing in different verticals, e.g. Health, Banking, Dating; and users sharing more and more private information with chatbots — studies have started to highlight the privacy risks of chatbots. In this paper, we propose two privacy-preserving approaches for chatbot conversations. The first approach applies ‘entity’ based privacy filtering and transformation, and can be applied directly on the app (client) side. It however requires knowledge of the chatbot design to be enabled. We present a second scheme based on Searchable Encryption that is able to preserve user chat privacy, without requiring any knowledge of the chatbot design. Finally, we present some experimental results based on a real-life employee Help Desk chatbot that validates both the need and feasibility of the proposed approaches.
Smart information desk system with voice assistant for universities IJECEIAES
This article aims to develop a smart information desk system through a smart mirror for universities. It is a mirror with extra capabilities of displaying answers for academic inquiries such as asking about the lecturers’ office numbers and hours, exams dates and times on the mirror surface. In addition, the voice recognition feature was used to answer spoken inquiries in audio responds to serve all types of users including disabled ones. Furthermore, the system showed general information such as date, weather, time and the university map. The smart mirror was connected to an outdoor camera to monitor the traffics at the university entrance gate. The system was implemented on a Raspberry Pi 4 model B connected to a two-way mirror and an infrared (IR) touch frame. The results of this study helped to overcome the problem of the information desk absence in the university. Therefore, it helped users to save their time and effort in making requests for important academic information.
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS ijaia
Recruitment in the IT sector has been on the rise in recent times. Software companies are on the hunt to recruit raw talent right from the colleges through job fairs. The process of allotment of projects to the new recruits is a manual affair, usually carried out by the Human Resources department of the organization. This process of project allotment to the new recruits is a costly affair for the organization as it relies mostly on human effort. In the recent times, software companies round the world are leveraging the advances in machine learning and Artificial intelligence in general to automate routine tasks in the enterprise to increase the productivity. In the paper, we discuss the design and implementation of a resume classifier application which employs an ensemble learning based voting classifier to classify a profile of a candidate into a suitable domain based on his interest, work-experience and expertise mentioned by the candidate in the profile. The model employs topic modelling techniques to introduce a new domain to the list of domains upon failing to achieve the threshold value of confidence for the classification of the candidate profile. The Stack-Overflow REST APIs are called for the profiles which fail on the confidence threshold test set in the application. The topics returned by the APIs are subjected to topic modelling to obtain a new domain, on which the voting classifier is retrained after a fixed interval to improve the accuracy of the model.Overall, emphasis is laid out on building a dynamic machine learning automation tool which is not solely dependent on the training data in allotment of projects to the new recruits. We extended our previous work withnew learning model that has the ability to classify the resumes with better accuracy and support more new domains.
SEARCH FOR ANSWERS IN DOMAIN-SPECIFIC SUPPORTED BY INTELLIGENT AGENTSijcsit
Search for answers in specific domains is a new milestone in question answering. Traditionally, question answering has focused on general domain questions. Thus, the most relevant answers (or passages) are selected according to the type of question and the Named Entities included in the possible answers. In this paper, we present a novel approach on question answering over specific (or technical) domains. This proposal allows us to answer questions such as “What article is appropriate for … “, “What are the articles related to … “, these kind of questions cannot be answered by a general question answering system. Our approach is based on a set of laws of a specific domain, which contain a large set of laws regarding the work organized into a hierarchy. We consider generic concepts such as “article” semantic categories. Our results on the corpus of Federal Labor Law show that this approach is effective and highly reliable.
Search for answers in specific domains is a new milestone in question answering. Traditionally, question answering has focused on general domain questions. Thus, the most relevant answers (or passages) are selected according to the type of question and the Named Entities included in the possible answers. In this paper, we present a novel approach on question answering over specific (or technical) domains. This proposal allows us to answer questions such as “What article is appropriate for … “, “What are the articles related to … “, these kind of questions cannot be answered by a general question answering system. Our approach is based on a set of laws of a specific domain, which contain a large set of laws regarding the work organized into a hierarchy. We consider generic concepts such as “article” semantic categories. Our results on the corpus of Federal Labor Law show that this approach is effective and highly reliable.
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
Researchers Navin Kasa, Andrew Dahbura, and Charishma Ravoori undertook a capstone project—part of the UVA Data Science Institute Master of Science in Data Science program—that addresses credit card fraud detection through a semi-supervised approach, in which clusters of account profiles are created and used for modeling classifiers.
A DEVELOPMENT FRAMEWORK FOR A CONVERSATIONAL AGENT TO EXPLORE MACHINE LEARNIN...mlaij
This study aims to introduce a discussion platform and curriculum designed to help people understand how
machines learn. Research shows how to train an agent through dialogue and understand how information
is represented using visualization. This paper starts by providing a comprehensive definition of AI literacy
based on existing research and integrates a wide range of different subject documents into a set of key AI
literacy skills to develop a user-centered AI. This functionality and structural considerations are organized
into a conceptual framework based on the literature. Contributions to this paper can be used to initiate
discussion and guide future research on AI learning within the computer science community.
NLP (Natural Language Processing) is a mechanism that helps computers to know natural languages like English. In general, computers can understand data, tables etc. which are well formed. But when it involves natural languages, it's unacceptable for computers to spot them. NLP helps to translate the tongue in such a fashion which will be easily processed by modern computers. Financial Tracker is an approach which will use NLP as a tool and can differentiate the user messages in various categories. the appliance of the approach will be seen at multiple levels. At a personal level, this permits users to filtrate useful financial messages from an large junk of text messages. On the opposite hand, from an industrial point of view, this can be useful in services like online loan disbursal, which are hitting the market nowadays. These services attempt to provide online loans to individuals in an exceedingly faster and quicker manner. But when it involves business view, loan recovery from customers becomes a really important & crucial aspect. As most such services can’t take strict legal actions against the fraud customers, it becomes a requirement that loan should be provided only to those customers who deserve it. At that time, this model can come under the image. As a business we will find the user’s messages from their inbox (after taking permission from the users). These messages are often filtered using NLP which might help to differentiate various types of messages within the user's inbox which might further be used as a content for further prediction and analysis on user’s behaviour in terms of cash related transactions.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
The heavy involvement of the Arabic internet users resulted in spreading data written in the Arabic language and creating a vast research area regarding natural language processing (NLP). Sentiment analysis is a growing field of research that is of great importance to everyone considering the high added potential for decision-making and predicting upcoming actions using the texts produced in social networks. Arabic used in microblogging websites, especially Twitter, is highly informal. It is not compliant with neither standards nor spelling regulations making it quite challenging for automatic machine-learning techniques. In this paper’s scope, we propose a new approach based on AutoML methods to improve the efficiency of the sentiment classification process for dialectal Arabic. This approach was validated through benchmarks testing on three different datasets that represent three vernacular forms of Arabic. The obtained results show that the presented framework has significantly increased accuracy than similar works in the literature.
TOP CITED ARTICLES - The International Journal of Multimedia & Its Applicatio...ijma
The International Journal of Multimedia & Its Applications (IJMA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Multimedia & its applications. The journal focuses on all technical and practical aspects of Multimedia and its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding recent developments this arena, and establishing new collaborations in these areas.
Learner Ontological Model for Intelligent Virtual Collaborative Learning Envi...ijceronline
An enacting approach to intelligent virtual collaborative learning model is explored through the lens of critical ontology. This ontological model enables to reuse of the domain knowledge and to make the knowledge explicitly available to the agent working as an Expert System, which uses the operational knowledge in collaborative learning environment. This ontological model used by the agent to identify the preliminary competency level of the user. This environment offers personalized education to each learner in accordance with his/her learning preferences, and learning capabilities. Here the factors considered to identify the learning capability taken are demographic profile, age, family profile, basic educational qualification and basic competency scale. The conception of heuristics is then used by the agent to determine the effectiveness of the learner by referring the different parameters of the learner available in the ontological model.To help getting over this, the paper describes the experience on using an ontological model for collaborative learning to relate and integrate the history of the learner by maintaining the history of learner in collaborative learning environment that will be used by the Multi-Objective Grey Situation Decision Making Theory to infer the understanding level of user and produces the conditional content to the user
NLP-based personal learning assistant for school education IJECEIAES
Computer-based knowledge and computation systems are becoming major sources of leverage for multiple industry segments. Hence, educational systems and learning processes across the world are on the cusp of a major digital transformation. This paper seeks to explore the concept of an artificial intelligence and natural language processing (NLP) based intelligent tutoring system (ITS) in the context of computer education in primary and secondary schools. One of the components of an ITS is a learning assistant, which can enable students to seek assistance as and when they need, wherever they are. As part of this research, a pilot prototype chatbot was developed, to serve as a learning assistant for the subject Scratch (Scratch is a graphical utility used to teach school children the concepts of programming). By the use of an open source natural language understanding (NLU) or NLP library, and a slackbased UI, student queries were input to the chatbot, to get the sought explanation as the answer. Through a two-stage testing process, the chatbot’s NLP extraction and information retrieval performance were evaluated. The testing results showed that the ontology modelling for such a learning assistant was done relatively accurately, and shows its potential to be pursued as a cloud-based solution in future.
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
The research-based implementations towards Sentiment analyses are about a decade old and have introduced many significant algorithms, techniques, and framework towards enhancing its performance. The applicability of sentiment analysis towards business and the political survey is quite immense. However, we strongly feel that existing progress in research towards Sentiment Analysis is not at par with the demand of massively increasing dynamic data over the pervasive environment. The degree of problems associated with opinion mining over such forms of data has been less addressed, and still, it leaves the certain major scope of research. This paper will brief about existing research trends, some important research implementation in recent times, and exploring some major open issues about sentiment analysis. We believe that this manuscript will give a progress report with the snapshot of effectiveness borne by the research techniques towards sentiment analysis to further assist the upcoming researcher to identify and pave their research work in a perfect direction towards considering research gap.
User experience improvement of japanese language mobile learning application ...IJECEIAES
Advances in smartphone technology have led to the strong emergence of mobile learning (m-learning) on the market to support foreign language learning purposes, especially for the Japanese language. No matter what kind of m-learning application, their goal should help learners to learn the Japanese language independently. However, popular Japanese m-learning applications only accommodate on enhancing reading, vocabulary and writing ability so that user experience issues are still prevalent and may affect the learning outcome. In the context of user experience, usability is one of the essential factors in mobile application development to determine the level of the application’s user experience. In this paper, we advocate for a user experience improvement by using the mental model and A/B testing. The mental model is used to reflect the user’s inner thinking mode. A comparative approach was used to investigate the performance of 20 highgrade students with homogenous backgrounds and coursework. User experience level was measured based on the usability approach on pragmatic quality and hedonic quality like effectiveness (success rate of task completion), efficiency (task completion time) and satisfaction. The results then compared with an existing Japanese m-learning to gather the insight of improvement of our proposed method. Experimental results show that both m-learning versions proved can enhance learner performance in pragmatic attributes. Nevertheless, the study also reveals that an m-learning that employs the conversational mental model in the learning process is more valued by participants in hedonic qualities. Mean that the proposed m-learning which is developed with the mental model consideration and designed using A/B testing is able to provide conversational learning experience intuitively.
Cloud Computing and Content Management Systems : A Case Study in Macedonian E...neirew J
Technologies have become inseparable of our lives, economy, and the society as a whole. For example,
clouds provide numerous computing resources that can facilitate our lives, whereas the Content
Management Systems (CMSs) can provide the right content for the right user. Thus, education must
embrace these emerging technologies in order to prepare citizens for the 21st century. The research
explored ‘if’ and ‘how’ Cloud Computing influences the application of CMSs, and ‘if’ and ‘how’ it fosters
the usage of mobile technologies to access cloud resources. The analyses revealed that some of the
respondents have sound experience in using clouds and in using CMSs. Nevertheless, it was evident that
significant number of respondents have limited or no experience in cloud computing concepts, cloud
security and CMSs, as well. Institutions of the system should update educational policies in order to enable
education innovation, provide means and support, and continuously update/upgrade educational
infrastructure.
CLOUD COMPUTING AND CONTENT MANAGEMENT SYSTEMS: A CASE STUDY IN MACEDONIAN ED...ijccsa
Technologies have become inseparable of our lives, economy, and the society as a whole. For example,
clouds provide numerous computing resources that can facilitate our lives, whereas the Content
Management Systems (CMSs) can provide the right content for the right user. Thus, education must
embrace these emerging technologies in order to prepare citizens for the 21st century. The research
explored ‘if’ and ‘how’ Cloud Computing influences the application of CMSs, and ‘if’ and ‘how’ it fosters
the usage of mobile technologies to access cloud resources. The analyses revealed that some of the
respondents have sound experience in using clouds and in using CMSs. Nevertheless, it was evident that
significant number of respondents have limited or no experience in cloud computing concepts, cloud
security and CMSs, as well. Institutions of the system should update educational policies in order to enable
education innovation, provide means and support, and continuously update/upgrade educational infrastructure.
Integrated Social Media Knowledge Capture in Medical Domain of IndonesiaTELKOMNIKA JOURNAL
The Social Media Platforms, as the one of largest part of today data traffic on the Internet,
disseminate a vast volume of information, including medical information in it. Knowledge management
system (KMS) approach is applied with purpose to capture, maintain, and manage tacit or explicit
knowledge available and collected within the social media platforms, organization’s database, knowledge
base, or document repository. By adding Indonesian Natural Language Processing (InaNLP), Machine
Learning and Data Mining approach, our research has proposed a framework which is theoretically
designed to improve the previous research related to social media knowledge capture model and enhance
its accuracy and reliability of knowledge retrieved compared to previous knowledge capture model. This
system mainly aimed for medical practitioner to give a quick suggestion of the diseases regarding to the
early diagnose which has been taken in the first place. On this current research state, the pre-processing
phase of the framework implementation and knowledge presentation is our main concernto maximize the
information value for the knowledge users and also to reduce the language issues in texts such as
ambiguity, inconsistency, use of slang vocabulary, etc.According to this research’s goal, we have designed
an algorithm to extract feature from dataset.
ow-a-days data volumes are growing rapidly in several domains. Many factors have contributed to this growth, including inter alia proliferation of observational devices, miniaturization of various sensors ,improved logging and tracking of systems, and improvements in the quality and capacity of both disk storage and networks .Analyzing such data provides insights that can be used to guide decision making. To be effective, analysis must be timely and cope with data scales. The scale of the data and the rates at which they arrive make manual inspection infeasible. As an educational management tool, predictive analytics can help and improve the quality of education by letting decision makers address critical issues such as enrollment management and curriculum Development. This paper presents an analytical study of this approach’s prospects for education planning. The goals of predictive analytics are to produce relevant information, actionable insight, better outcomes, and smarter decisions, and to predict future events by analyzing the volume, veracity, velocity, variety, value of large amounts of data and interactive exploration.
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
This paper presents a comprehensive exploration of graph embedding techniques for sentiment analysis. The objective of this study is to enhance the accuracy of sentiment analysis models by leveraging the rich contextual relationships between words in text data. We investigate the application of graph embedding in the context of sentiment analysis, focusing on it is effectiveness in capturing the semantic and syntactic information of text. By representing text as a graph and employing graph embedding techniques, we aim to extract meaningful insights and improve the performance of sentiment analysis models. To achieve our goal, we conduct a thorough comparison of graph embedding with traditional word embedding and simple embedding layers. Our experiments demonstrate that the graph embedding model outperforms these conventional models in terms of accuracy, highlighting it is potential for sentiment analysis tasks. Furthermore, we address two limitations of graph embedding techniques: handling out-of-vocabulary words and incorporating sentiment shift over time. The findings of this study emphasize the significance of graph embedding techniques in sentiment analysis, offering valuable insights into sentiment analysis within various domains. The results suggest that graph embedding can capture intricate relationships between words, enabling a more nuanced understanding of the sentiment expressed in text data.
A prior case study of natural language processing on different domain IJECEIAES
In the present state of digital world, computer machine do not understand the human’s ordinary language. This is the great barrier between humans and digital systems. Hence, researchers found an advanced technology that provides information to the users from the digital machine. However, natural language processing (i.e. NLP) is a branch of AI that has significant implication on the ways that computer machine and humans can interact. NLP has become an essential technology in bridging the communication gap between humans and digital data. Thus, this study provides the necessity of the NLP in the current computing world along with different approaches and their applications. It also, highlights the key challenges in the development of new NLP model.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Similar to Evaluating the impact of removing less important terms on sentiment analysis (20)
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Pushing the limits of ePRTC: 100ns holdover for 100 days
Evaluating the impact of removing less important terms on sentiment analysis
1. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 6
EVALUATING THE IMPACT OF REMOVING LESS IMPORTANT TERMS ON
SENTIMENT ANALYSIS
Salhana Amad Darwis, Duc Nghia Pham, Ang Jia Pheng, Ong Hong Hoe
Artificial Intelligence Laboratory,
MIMOS Berhad
{salhana.darwis, nghia.pham, jp.ang, hh.ong} @mimos.my
ABSTRACT
Sentiment analysis is an important task in Natural Language Processing (NLP) that analyses
and predicts people s opinion from te tual data. It is a complex process due to the interactions
with computer science, linguistics, psychology and social science disciplines. There is no
straight forward rule to analyse and predict sentiment. Supervised learning methods, which
adopt learning models from human, are being widely used by NLP researchers and experts to
predict sentiment. However, this approach is tricky due to the challenges in ensuring the
quality of the manually labelled training dataset. In this study, we investigated the use of
ling istic factors to impro e the model s accuracy. We gathered two datasets: (i) 125,000
annotated sentences from Amazon product reviews, and (ii) 11,250 annotated sentences from
financial news articles. We then pre-processed the data, identified the less important terms that
exist in the dataset, the linguistic features and their effect towards the correctness of predicted
sentiment. Our experimental results showed that punctuation separation and removal of
supporting POS words improves precision accuracy in larger-generic dataset rather than in
smaller-context sensitive dataset.
Field of Research: sentiment analysis, supervised learning, NLP, linguistics
----------------------------------------------------------------------------------------------------------------
1. Introduction
Modern technologies enable people to express their opinions, thoughts and feelings about what
they experience and things happening around them through various online channels such as
social media, online surveys, blogs, etc. Such massively growing social data is indeed
meaningful and useful for businesses and organisations to gather feedbacks on their products
and services, or to study and analyse social issues, psychological issues, political situations,
etc. However, due to the variety, velocity and volume of such data, it is difficult to digest and
summarise social data into a meaningful form [13]. Sentiment analysis, also known as opinion
mining, specifically analyses people opinions, feelings, and thoughts that are usually
expressed in textual data through the above channels [10].
Due to the practical value of sentiment analysis in several application areas, more and more
efforts have been spent to address this task and the results are very promising. However,
researchers working on it are facing tremendous challenges when dealing with textual data.
Sentiment analysis is a part of NLP tasks which is known as one of the complex problem areas
in Artificial Intelligence [6]. Solely using linguistic method for sentiment analysis is not
sufficient as it does not address uncertainties that require a tacit and cannot
learn from past experience when interpreting opinion; both are important elements in analysing
sentiment [4,6]. For example, the words: I , am , happ , not , to , the , this and mo ie
exist in both sentences below, whereby the negative sentiment is expressed sarcastically:
2. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 7
Positive - I am happ to atch this mo ie beca se the ticket price is not e pensi e
Negative - I am happ to lea e the cinema because the movie is not good
In the above example, by using linguistic rules, it is difficult to distinguish which words or
phrases that determine positive or negative sentiment as the element of sarcasm was expressed
indirectly as a negative sentiment. Due to such linguistic constraints, many researchers from
both academic and industries prefer the machine learning approaches for sentiment analysis
[1,17,19,20] that enables the machine to learn from examples and past experience just like a
human does.
On the other hand, using machine learning on textual data exposes other challenges and
drawbacks as machine learning methods also have their own limitations:
a) Using simpler words representation model a a a
ensure the performance in sentiment analysis [19]. It looks at the word counts [7]
discarded sequence of words in a sentence and word to word relations. Some important
linguistic information from the words, phrases and sentences may not be identified during
processing.
b) In supervised learning, a good training dataset has to be set up as the input for the machine
learning on NLP context to ensure it learn correctly from sufficient examples. Enormous
work needs to be done in preparing the training dataset, it is tedious, high in
computational cost and yet it may suffer from inaccuracy if less optimal examples fed
into the training dataset.
c) Using mathematical model with word2vec representation approach for instance, causes
the word vector grow when more training data provided. The omission of pre-processing
and lack of linguistic information analysis such as punctuation, word variance, part of
speech, stop words, and noises may lead to the word vector grow even more with higher
dimensions [9,15,16,18]. This factor increases the complexity in text processing and may
impact the performance of sentiment analysis.
In this paper, we present our work on sentiment analysis using supervised learning for English
text. We employed fastText, an open sourced algorithm for text classification developed for
Facebook. Studying the dataset, we noticed that there was a large variance of words repeatedly
occurred in the dataset. In our experimental study, we identified less important words in the
dataset, and then evaluated the impact of dimensionality reduction (by removing these words
from the dataset) towards the performance of fastText. Finally, we provide our conclusion and
suggestion for future works.
2. Machine Learning for Sentiment Analysis
In order for a system to be categorised as intelligent, it must have the ability to learn. In
developing intelligent systems, researchers are grappling to find solutions for common
bottlenecks in AI which are to handle uncertainty, ambiguity and to applying human tacit
knowledge [6]. Machine learning addresses these obstacles by having the ability to learn which
could not be addressed by conventional computing methods or rules. Over the years, various
researches were carried out to compare the performance of methods for sentiment analysis.
Popular machine learning methods for sentiment analysis are Naïve Bayes, k-
NearestNeighbour (k-NN), Support Vector Machine (SVM), Long Short-Term Memory
(LSTM), and Neural Networks (NN) [1,6,17,19,20]. Most of these methods are supervised
learning.
3. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 8
2.1 Supervised Learning
Supervised learning is a machine learning approach whereby the machine learns from labelled
or annotated data. The objective of supervised learning is to build intelligent system that can
learn from input-output training samples [14]. The most interesting aspect of machine learning
is its learning ability [6] which resemble a natural process of human learning. In sentiment
analysis using supervised learning, selected samples of sentences from different scenarios are
annotated with sentiment from human judgement a a a a a a
a a a .
Supervised learning, however, requires a good training dataset as the input to ensure the
effective learning process. Learning is one of the most important aspect in intelligence and it
is a complex human process. As such, training a machine to learn can lead to complexities and
challenges. Building an effective learning strategy is one of the hardest questions raised in
supervised learning and neural computing [6]. Hence, preparing a training dataset for
supervised learning can be tricky. As the system learns from more examples, the complexity
increases and the vector size grows with higher dimension of word representation. Training
time will also increase with the growing data. Figure 1 illustrates the distribution of words in a
multi-dimensional vector space upon completion of training in a supervised learning.
Figure 1: Illustration of word vector in multiple dimensions upon completion of training a dataset.
2.2 fastText
While many supervised learning methods take a long time to train on large training dataset,
fastText (an open sourced text classifier for Facebook) was introduced to simplify the
complexity in handling large dataset [2,8]. It improves upon traditional learning method for
text classification through various ways: (i) it uses hidden layers to benefit from reusable
variables, (ii) it uses hierarchical softmax to optimise the learning process and reduce running
time and computational complexity, and (iii) it uses n-Grams instead of 'bags of words' to
capture partial information about local word order, and yet remains comparable to other
methods that capture actual word order [2,8].
3. The Complexity in Understanding Natural Language
Interpreting human language is a complex cognitive process as it relies not only on the syntactic
aspect of the language used but also on how the language is perceived by human when
expressing and interpreting thoughts. While various levels of knowledge involved in natural
language understanding process such as syntactic knowledge, semantic knowledge and
pragmatic knowledge [6], there is no straight forwards rule when analysing natural language.
3.1 Part of Speech (POS)
In linguistics, a language is spoken, written and understood by following set of rules such as
the POS rule. The role of POS has been studied in relation with psycho-linguistic effect of
4. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 9
human in constructing and understanding sentences. Due to its importance and vital role in
language, the identification of POS, also known as POS tagging, has become a pre-requisite
step of language analysis in many computational linguistic applications to provide more
linguistic information before other NLP methods are applied to the text. POS tagging is
important to assign a word in a sentence with morpho-syntactic category [21]. Table 1
summarises different POS types and their role adapted from an English grammar book [3].
Table 1: Summary of POS types and roles.
Word Class Role Example
Noun Identifies a person, a thing, an idea, quality or state girl, engineer, friend, horse, wall,
flower, country, anger, courage, life
Verb Describes what a person or thing does or what happens run, kick, eat
Adjective Describes a noun, giving more information about
people, animal or things represented by noun or
pronoun
big, tall, long, hungry, beautiful
Adverb Gives information about a verb, adjective, or other
adverb
quietly, loudly, badly, accurately
Pronoun Used in place of a noun that is already known or has
already been mentioned to avoid repeating the noun.
she, him, that, something, them
Preposition Used in front of nouns or pronouns to show the
relationship between them. They describe the position,
time, or the way in which something is done
after, in, to, on, with
Conjunction Used to connect phrases, clauses, and sentences. and, because, but, for, if,
or, and when.
Determiner Introduces a noun and usually used before noun a, an, the, every, this, those
Exclamation
/Interjection
Expresses strong emotion, such as surprise, pleasure,
or anger.
How wonderful! Hello! Well done!
Despite of the crucial role of POS, there are POS types that only play supporting roles to
provide description to other words with different POS. Referring to Table 1, POS classes that
play supporting role are Adverb, Determiner, Conjunction, Pronoun, and Preposition [17]. In
this paper, we refer them as supporting POS . Some of them are frequently used but carry less
information in computational linguistic perspective.
3.2 Stop Words
In NLP, stop words (such as a , an , the , e er , this , and , and beca se ) are words
with lesser information that commonly exist in textual document written in natural language.
Stop words are commonly identified from supporting POS such as determiner, pronoun, and
preposition [9]. It is a common pre-requisite steps to remove stop words in pre-processing stage
in NLP. Even though the exhaustive removal of stop words is a debatable practice [5], the
research on stop words removal shows encouraging trend amongst NLP experts from academic
and industry through various projects [5,11,12].
4. Methodology
In this research, we focus on text pre-processing step to identify less important words in the
training datasets that could impair the effectiveness of the learning. We study various POS
classes and stop words. In this section, we experiment the effect of dimensionality reduction of
dataset by removing stop words and the less important words such as words with supporting
5. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 10
POS towards the precision and correctness of sentiment analysis trained using fastText
algorithm. The overall sentiment analysis process is illustrated in Figure 2.
Step 1 Step 2 Step 3 Step 4
------------------------------------------------------------------------------------ Step 5
Step 6
Figure 2: The sentiment analysis process.
4.1 The Datasets
As illustrated in Figure 2, step 1 involves collecting English dataset of sentiment annotated
sentences. For this research, we prepared two datasets as described below:
(a) Amazon Dataset: consists of 125,000 sentences from Amazon product reviews (that
were written in informal language). These sentences are annotated with either positive
or negative sentiment. This dataset was later split into a training set of 100,000
sentences and a test set of 25,000 sentences.
(b) Financial Dataset: consists of 11,250 sentences from newspaper articles on Financial
Market (that were written in formal language). These sentences are annotated with
either positive, negative or neutral sentiment. The dataset was then split into a training
set of 9,000 sentences and a test set of 2,250 sentences.
4.2 Pre-processing
In this work, we focused on the dimensionality reduction of the training and test dataset as
illustrated in step 2, Figure 2 through data pre-processing steps: tokenisation, punctuation
separation, POS identification, stop words removal and words with supporting POS removal.
Note that, during pre-processing, we ensured that the punctuation was detached from words to
allow fastText to identify words as an individual item. For example, the punctuation in the
string eat. was separated as eat and . . We refer this process as punctuation separation.
Next, we proceeded with removal of less important words in two main parts:
First Part:
Firstly, we performed stop words removal on the dataset, consists of few sections:
1. Removal of maximum list of stop words collection from various sources of researches
and python (consists of 340 unique words) [11,12]
Raw Data
Sentences
with
Manual
Sentiment
Annotation
(Input File)
Training
Dataset
Word
embedding
Model
(output)
Sentiment
Analysis Training
Supervise the
dataset using
fastText
Data Pre-
processing
Tokenization,
punctuation,
POS, stop words
Classification
Model
(output)
New Query
Sentiment
Prediction
Sentiment
Predictions
(output)
Model
testing
Precision
Recall
(output)
6. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 11
2. Removal of minimal list of stop words collection from python (consist of 127 unique
words)
3. Removal of minimal stop words with supporting POS- Determiner, Conjunction,
Preposition and Pronoun, by their respective POS
Second Part:
Secondly, we performed removal of words with POS that only play supporting role to other
main POS, referred to as supporting POS, consist of POS types: Conjunction, Determiner,
Preposition and Pronoun (as mentioned in section 3.1). We gathered maximum list of words
with these supporting POS to be removed and validated their POS category with online
Cambridge Dictionary and an English grammar book [3]. These word lists are the extended
collection to contain maximum words with the respective POS regardless they are in stop words
list or not. This is to enable analysis when comparing of the effect of solely removing stop
words versus removing all words with supporting POS regardless of their existence in stop
words list. In this second part implementation, we only focused to perform below step:
1. Removal of words with supporting POS - Conjunction, Determiner, Preposition and
Pronoun from the dataset.
4.3 Training the Model
In step 4, we trained the model using the pre-processed dataset (which is shown in step 3) using
fastText. This step was repeated for both Amazon dataset and Financial dataset. The training
parameters were set with learning rate, lr = 1.0, epoch = 25, and word n-gram = 5. This step is
referred to as supervised , where all the samples in dataset is trained using fastText classifier
algorithm. Upon completion of training, two models were generated: (1) word embedding
model, and (2) classification model.
4.4 Evaluating the Model
Finally, in step 5 we evaluate the performance of the newly trained models (which were
generated in step 4) using the corresponding test set. We then calculate precision and recall
results of this sentiment analysis model. In step 6, we performed sentiment prediction query on
live cases to evaluate the correctness of sentiment prediction of the generated model.
5. Results & Discussion
The datasets described in section 4.1 were used to conduct experiments to find out the effect
of pre-processing on Amazon Dataset and Financial Dataset towards the precision, recall and
correctness of sentiment prediction. We also looked at the impact of pre-processing on the
vector size of the embedding models that were generated from these tests (by evaluating the
Vector file size ) a a a a -processing steps.
The results are discussed in the following sections (Section 5.1 and 5.2)
5.1 Removal of Stop Words and Supporting POS from Amazon Dataset
Experiment 1: Removal of Stop Words from Amazon dataset.
Table 2 shows the results of our experiment after removing stop words from the Amazon
dataset. Set A is the baseline where no change was made to the original Amazon dataset. We
then separated punctuations from words in the original dataset, resulted in set B. This set was
7. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 12
then used to create the subsequent sets C H, by removing either all the stop words or only the
stop words with a certain POS role.
Table 2: Experiment 1 results Removal of stop words from Amazon dataset.
Amazon dataset Evaluation Total words Word embedding model
Stop word removal Precision Recall Count
Reduction
(%)
Size (KBs)
Reduction
(%)
Set A - Original
Training Dataset
0.907 0.907 9,179,234 - 410,166 -
Set B Punctuation
Separation
0.920 0.920 9,179,234 - 249,031 39.280
Set C - Stop words
(Maximum)
0.896 0.896 4,927,534 46.32% 217,425 46.990
Set D - Stop words
(Minimum)
0.903 0.903 5,401,919 41.15% 211,048 48.545
Set E - Stop words
(POS Conjunction)
0.919 0.919 8,884,591 3.21% 213,034 48.061
Set F - Stop words
(POS Determiner)
0.919 0.919 8,199,831 10.67% 212,778 48.123
Set G - Stop words
(POS Pronoun)
0.919 0.919 8,594,147 6.37% 212,721 48.137
Set H - Stop words
(POS Preposition)
0.918 0.918 8,314,093 9.42% 212,836 48.109
From this experiment, we observed that separation of punctuation from words (set B)
contributes to the highest precision and recall at 0.92, while reducing the vector size by 39.28%.
On the other hand, removal of stop words from the punctuation-separated dataset had a negative
impact on the performance of fastText (see results of sets C and D) compared to the original
set A. Meanwhile, removal of stop words by a specific POS role on top of punctuation
separation (sets E H) resulted in slight improvements on precision and recall compared to the
original set A. Out of the four support POS roles tested, removal of stop words with preposition
role produced the least performance boost. However, the difference in precision and recall to
the other three POS roles (which were on par on performance) is very negligible (0.001).
Table 3: Experiment 2 results Removal of words with supporting POS roles (Determiner, Conjunction, Preposition,
or Pronoun) from the Amazon dataset.
Amazon dataset Evaluation Total words Word embedding model
Supporting POS removal Precision Recall Count
Reduction
(%)
Size
(KBs)
Reduction
(%)
Set A - Original Training
Dataset
0.907 0.907 9,179,234 - 410,166 -
Set B Punctuation
Separation
0.920 0.920 9,179,234 - 249,031 39.280
Set C - All words
(POS Determiner)
0.917 0.917 7,908,000 13.84% 212,603 48.166
Set D - All words
(POS Conjunction)
0.919 0.919 8,593,953 6.37% 212,662 48.152
Set E - All words
(POS Preposition)
0.919 0.919 8,214,290 10.51% 212,520 48.186
Set F - All words
(POS Pronoun)
0.918 0.918 8,664,127 5.61% 212,792 48.121
8. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 13
Experiment 2: Removal of All Words with Supporting POS from Amazon dataset.
Table 3 shows our experimental result on removal of words with supporting POS roles
(determiner, conjunction, preposition, or pronoun) from the Amazon dataset. Similar to
experiment 1, set B was created by separating punctuations from words; and sets C F were
created by removing words with respective support POS role from set B.
The previous observation in Experiment 1 also applicable here: removal of words with
supporting POS roles improved fastText a a a
on the original set A. However, the impact of these word removals remained slightly less than
just simply separating punctuations, even though these settings can significantly reduce the size
of the word embedding model. Among four supporting POS roles tested, removal of
determiners produced the highest reduction of 13.67% total words. However, its impact on
a a a a a a POS .
5.2 Removal of All Words with Supporting POS from Financial Dataset
Experiment 3: Removal of Stop Words from Financial Dataset
Table 4 shows the results of our experiment after removing stop words from the Financial
dataset. Set A is the baseline where no change was made to the original Financial dataset. We
then separated punctuations from words in the original dataset, resulted in set B. This set was
then used to create the subsequent sets C H, by removing either all the stop words or only the
stop words with a certain POS role.
Table 4: Experiment 3 results Removal of stop words from Financial dataset.
Financial dataset Evaluation Total words Word embedding model
Stop word removal Precision Recall Count
Reduction
(%)
Size (KBs)
Reduction
(%)
Set A - Original
Training Dataset
0.809 0.809 234,110 - 25,301 -
Set B Punctuation
Separation
0.809 0.809 234,110 - 16,928 33.00
Set C - Stop words
(Maximum)
0.787 0.787 166,853 28.72 16,334 35.44
Set D - Stop words
(Minimum)
0.793 0.793 173,139 26.04 16,498 34.79
Set E - Stop words
(POS Conjunction)
0.806 0.806 232,264 0.79 16,563 34.53
Set F - Stop words
(POS Determiner)
0.801 0.801 216,902 7.35 16,799 33.60
Set G -Stop words
(POS Pronoun)
0.799 0.799 232,821 0.55 16,813 33.54
Set H - Stop words
(POS Preposition)
0.794 0.794 212,009 9.44 16,552 34.59
Table 4 shows that separation of punctuations from words (set B) neither increased nor
decreased the performance of fastText in comparison to its performance on the original dataset
(set A). This is contrast to our finding in experiments 1 & 2 where separation of punctuations
from words gave the highest accuracy boost to fastText.
In addition, removal of stop words, regardless of whether POS role filter was applied or not,
decreased the accuracy of fastText on this Financial dataset, ranging from -0.022 (set C:
9. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 14
maximum list of stop words) to -0.003 (set E: conjunction). This observation is again contrast
to our finding on the larger Amazon dataset where removal of only stop words with a specific
POS role helped improve fastText a .
Experiment 4: Removal of All Words with Supporting POS from Financial Dataset
Table 5 shows our experimental result on removal of words with supporting POS roles
(Determiner, Conjunction, Preposition, or Pronoun) from the Financial dataset. Similar to
previous experiments, set B was created by separating punctuations from words; and sets C
F were created by removing words with respective support POS role from set B.
Table 5: Experiment 4 results - Removal of words with supporting POS roles (Determiner, Conjunction, Preposition, or
Pronoun) from Financial dataset.
Financial dataset Evaluation Total words Word embedding model
Supporting POS removal Precision Recall Count
Reduction
(%)
Size
(KBs)
Reduction
(%)
Set A - Original Training
Dataset
0.809 0.809 234,110 - 25,301 -
Set B - Punctuation
Separation
0.809 0.809 234,110 - 16,928 33.00
Set C - All words
(POS Determiner)
0.797 0.797 214,040 8.57% 16,909 33.16
Set D -All words
(POS Conjunction)
0.802 0.802 225,743 3.57% 16,959 32.97
Set E - All words
(POS Preposition)
0.789 0.789 206,336 11.86% 16,869 33.33
Set F - All words
(POS Pronoun)
0.802 0.802 231,503 1.12% 16,974 32.91
In contrast to our finding in experiment 2, removal of words with any supporting POS roles
(determiner, conjunction, preposition, or pronoun) worsened the accuracy of fastText on the
Financial dataset, with a negative difference ranging from -0.020 (set E: preposition) to -0.007
(set D: conjunction and set F: pronoun). Here, removal of prepositions (not determiners as
found in experiment 2) produced the highest reduction of 11.86% total words. It also reduced
the word embedding model the most by 33.33%.
5.3 Discussion
Punctuation separation provided the best result in the Amazon dataset due to the reduction of
noise from the original dataset. In the original dataset, punctuations were attached with word
causing unnecessary formation of new tokens which were regarded as a new word variance by
fastText. For example, both tokens eat and eat. exist in the original dataset, causing the
formation a new word eat. . Due to the large data volume in Amazon dataset, these word
variances grow tremendously and increase the size of word embedding model. They were
eliminated by punctuation separation. On the other hand, our Financial dataset consists of
formal written sentences and does not contain large vocabulary given its small size. Hence, the
low occurrence of punctuation-attached word variances, resulting in neither improvement nor
degrading in performance when pre-processing text with punctuation separation.
In our experiments, exhaustive removal of stop words decreases the precision and recall of
fastText on both Amazon and Financial datasets. Our stop words list is a general compilation
of frequently used words in computational linguistics with various POS and different level of
10. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 15
importance and meaning. Exhaustive removal of all words from this list without carefully
identifying their roles in a particular dataset causes reduction in sentiment precision. For
example, the word sho ld , ha e , does exist in stop word list with POS category of Verb
that may carry some valuable meaning in a particular context even though these words can be
less important in other contexts. This finding also justifies some reasons whereby the
exhaustive stop words removal is a debatable practice among scholars in NLP [5].
Removal of words with POS role (determiner, conjunction, preposition and pronoun) decreases
the accuracy of fastText on Financial dataset but improves the accuracy on Amazon dataset.
This finding suggests that, in a specific domain dataset, there is lower occurrence of less
important words even though the same word may appear as unimportant in other generic
dataset. In Financial dataset, it seems to suggest that words with POS determiner, conjunction,
preposition and pronoun may carry some meaning in determining sentiment. As the supervised
learning model learn from lesser example in smaller dataset, unnecessary removal of words
may worsen the result. Furthermore, word sequence is being captured when using n-Gram
model on fastText. Hence, unnecessary removal of valuable words can worsen the accuracy.
The above finding is further supported in experiment 4 Set E whereby the removal of
preposition produced the highest reduction of words and word embedding model, and worst
precision. This is due to the nature of Financial a a . U
Amazon dataset which is written in free-style format, the Financial dataset contain various
indicators in financial market domain such as do n , up , above . T , a
with POS-preposition can contribute to negative impact on precision due to the removal of such
important keywords which plays role as important indicator in Financial dataset. Below are the
examples:
Financial dataset:
negative F&N was down 18 sen to RM35.02.
positive The Hang Seng was up 1.33%.
positive prices have remained above that level since April
Amazon dataset:
negative Very disappointed in this book, as it was to have 16 illustrations,
as well as other pictures.
The selection of words removal should be carefully studied to avoid important keyword
is removed from a particular dataset.
6. Conclusion
The computational linguistic practice to remove stop words and less important words with
supporting POS is relevant on text processing. However, not all of them should be removed.
Part of these words may contain meaningful elements on a particular context. The level of
words importance may vary from one dataset to another. For example, the words do n , up ,
above a a a F a a a a . The stop word list and words with
supporting POS to be removed should be used as general guideline in text pre-processing step.
However, the selection of exact words to be removed need to be meticulously studied based on
few aspects which are (1) the structure of dataset sentences: structured or unstructured (2) the
existence of domain specific terms (3) words distribution and writing style: formal or informal
(4) volume of the dataset.
11. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 16
Basic pre-processing steps such as punctuation separation from word is important to reduce
irrelevant combination of words and punctuation in a larger and generic domain dataset. The
omission of punctuation processing in a larger and generic dataset, such as punctuation
separation may lead to unnecessary noise exist in the dataset that could cause the increment of
word embedding model size and impairment in the learning process.
7. Future Works
In future, we plan to focus in finding the actual less important stop words that can impair the
learning process of each datasets. Apart from that, we intend to further investigate the impact
of removing words that exist in stop words by various POS categories versus removing all
words with supporting POS roles. On the other hand, a separate research to identify higher
importance words in sentiment analysis should also be done along with identification of less
important words. This is to ensure words and phrases in the training dataset are given the
correct weightage during the supervised learning process.
8. References
[1] Annett, M., & Kondrak, G. (2008). A Comparison of Sentiment Analysis Techniques:
Polarizing Movie Blogs. Advances in Artificial Intelligence, (May), 25 35.
https://doi.org/10.1007/978-3-540-68825-9_3
[2] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors
with Subword Information. Transactions of the Association for Computational
Linguistics. https://doi.org/5. 10.1162/tacl_a_00051
[3] Ca , R., M Ca , M., Ma , G., & O K , A. (2016). English Grammar Today The
Cambridge A-Z Grammar of English. Cambridge United Kingdom: Cambridge University
Press.
[4] Deng, S., Sinha, A. P., & Zhao, H. (2017). Resolving Ambiguity in Sentiment
Classification: The Role of Dependency Features. ACM Trans. Manage. Inf. Syst., 8(2
3), 4:1--4:13. https://doi.org/10.1145/3046684
[5] Dolamic, L., & Savoy, J. (2010). Brief communication: When stopword lists make the
difference. Journal of the American Society for Information Science and Technology,
61(1), 200 203. https://doi.org/10.1002/asi.21186
[6] G F L . (1998). A a I : S a S a s for Complex
Problem Solving. In Addison-Wesley (3rd Edition). Addison-Wesley.
[7] Goldberg, Y. (2017). Neural network methods for natural language processing (Synthesis
Lectures on Human Language Technologies). Morgan & Claypool Publisher, Vol. 10, pp.
1 309. https://doi.org/10.1162/COLI_r_00312
[8] Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of Tricks for Efficient
12. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 17
Text Classification. Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 2, Short Papers, 2, 427 431.
https://doi.org/10.18653/v1/E17-2068
[9] Kumar, A. A., & Chandrasekhar, S. (2012). Text Data Pre-processing and Dimensionality
Reduction Techniques for Document Clustering. International Journal of Engineering
Research & Technology (IJERT), 1(5), 1 6. https://doi.org/2278-0181
[10] Kumar, A., & Panda, S. P. (2018). A Survey of Sentiment Analysis on Social Media.
International Journal for Research in Applied Science & Engineering Technology
(IJRASET), 6( February). https://doi.org/2321-9653
[11] Larsen, K. R. (2016). MIS Quarterly Article Stopword List. (January).
[12] Larsen, K. R., & Bong, C. H. (2016). A Tool for Addressing Construct Identity in
Literature Reviews and Meta-Analyses. MIS Quarterly, 40(3), 529 551.
https://doi.org/10.25300/misq/2016/40.3.01
[13] Liu, B. (2010). Sentiment Analysis and Subjectivity (To appear in Handbook of Natural
Language Processing) (second edi; N. Indurkhya & F. J. Damerau, Eds.).
http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-
analysis.pdf%5Cnhttp://people.sabanciuniv.edu/berrin/proj102/1-BLiu-Sentiment
Analysis and Subjectivity-NLPHandbook-2010.pdf
[14] Liu, Q., & Wu, Y. (2012). Supervised Learning. In N. M. Seel (Ed.), Encyclopedia of the
Sciences of Learning (2012th ed.). https://doi.org/10.1007/978-1-4419-1428-6_451
[15] Mao, Y., Balasubramanian, K., & Lebanon, G. (2010). Dimensionality Reduction for Text
using Domain Knowledge. COLING 10 Proceedings of the 23rd International
Conference on Computational Linguistics: Posters, (August ), 801 809.
[16] Martins, C. A., Monard, M. C., & Matsubara, E. T. (2003). Reducing the Dimensionality
of Bag-of-Words Text Representation Used by Learning Algorithms. Proc of 3rd IASTED
International Conference on Artificial Intelligence and Applications, 228 233.
https://pdfs.semanticscholar.org/a90f/ca4c78b66fb0e28f4fe8086d28f3720c7999.pdf
[17] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using
Machine Learning Techniques. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 79 86.
https://doi.org/10.3115/1118693.1118704
[18] Ponmuthuramalingam, P., & Devi, T. (2010). Effective Dimension Reduction Techniques
for Text Documents. IJCSNS International Journal of Computer Science and Network
Security, 10(7). Retrieved from http://paper.ijcsns.org/07_book/201007/20100712.pdf
[19] R , E., Ha a , M., Wa a , M., J , M., E , ., & S a , M.
(2018). More than Bags of Words: Sentiment Analysis with Word Embeddings.
Communication Methods and Measures, 12(2 3), 140 157.
https://doi.org/10.1080/19312458.2018.1455817
13. E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON
ARTIFICIAL INTELLIGENCE AND COMPUTER SCIENCE 2019
E-PROCEEDING OF THE 6TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND
COMPUTER SCIENCE (AICS 2019). (e-ISBN 978-967-0792-34-7). 29 July 2019, Puri Pujangga Hotel, Bangi
Selangor, Malaysia. Organized by https://worldconferences.net Page 18
[20] Sadanandan, A. A., Osman, N. A., Saifuddin, H., Khairuddin, M., Pham, D. N., Hoe, H.,
& Lumpur, K. (2016). Improving Accuracy in Sentiment Analysis for Malay Language.
4th International Conference on Artificial Intelligence and Computer Science
(AICS2016), (November), 28 29.
[21] Vanroose, P. (2001). Part-of-Speech Tagging From an Information-Theoretic Point of
View. (22nd July). http://citeseer.ist.psu.edu/646161. html
View publication stats
View publication stats