Although opinion mining is in a nascent stage of development but still the ground is set for dense growth of researches in the field. One of the important activities of opinion mining is to
extract opinions of people based on characteristics of the object under study. Feature extraction in opinion mining can be done by various ways like that of clustering, support vector machines
etc. This paper is an attempt to appraise the various techniques of feature extraction. The first part discusses various techniques and second part makes a detailed appraisal of the major techniques used for feature extraction.
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
Methodological study of opinion mining and sentiment analysis techniquesijsc
Decision making both on individual and organizational level is always accompanied by the search of
other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum
discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated
content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining
and sentiment analysis are the formalization for studying and construing opinions and sentiments. The
digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is
an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...ijfls
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Here, in this paper we are introducing a dynamic clustering algorithm using fuzzy c-mean clustering algorithm. We will try to process several sets patterns together to find a common structure. The structure is finalized by interchanging prototypes of the given data and by moving the prototypes of the subsequent clusters toward each other. In regular FCM clustering algorithm, fixed numbers of clusters are chosen and those are pre-defined. If, in case, the number of chosen clusters is wrong, then the final result will degrade the purity of the cluster. In our proposed algorithm this drawback will be overcome by using dynamic clustering architecture. Here we will take fixed number of clusters in the beginning but on iterations the algorithm will increase the number of clusters automatically depending on the nature and type of data, which will increase the purity of the result at the end. A detailed clustering algorithm is developed on a basis of the standard FCM method and will be illustrated by means of numeric examples.
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
In this experiment, I tried to implement Minimum
error rate classifier using the posterior probabilities which
uses Normal distribution to calculate likelihood probabilities to
classify given sample points
Multiclass Recognition with Multiple Feature Treescsandit
This paper proposes a multiclass recognition scheme which uses multiple feature trees with an
extended scoring method evolved from TF-IDF. Feature trees consisting of different feature
descriptors such as SIFT and SURF are built by the hierarchical k-means algorithm. The
experimental results show that the proposed scoring method combing with the proposed
multiple feature trees yields high accuracy for multiclass recognition and achieves significant
improvement compared to methods using a single feature tree with original TF-IDF.
Soft computing is likely to play aprogressively important role in many applications including image enhancement. The paradigm for soft computing is the human mind. The soft computing critique has been particularly strong with fuzzy logic. The fuzzy logic is facts representationas a
rule for management of uncertainty. Inthis paperthe Multi-Dimensional optimized problem is addressed by discussing the optimal thresholding usingfuzzyentropyfor Image enhancement. This technique is compared with bi-level and multi-level thresholding and obtained optimal
thresholding values for different levels of speckle noisy and low contrasted images. The fuzzy entropy method has produced better results compared to bi-level and multi-level thresholding techniques.
Using Deep Learning to Find Similar DressesHJ van Veen
Report by Luís Mey ( https://www.linkedin.com/in/lu%C3%ADs-gustavo-bernardo-mey-97b38927/ ) on Udacity Machine Learning Course - Final Project: Use Deep Learning to Find Similar Dresses.
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
In this paper, Design and Implementation of Binary Neural Network Learning with Fuzzy
Clustering (DIBNNFC), is proposed to classify semisupervised data, it is based on the
concept of binary neural network and geometrical expansion. Parameters are updated
according to the geometrical location of the training samples in the input space, and each
sample in the training set is learned only once. It’s a semisupervised based approach, the
training samples are semi-labelled i.e. for some samples, labels are known and for some
samples data labels are not known. The method starts with classification, which is done by
using the concept of ETL algorithm. In classification process various classes are formed.
These classes classify samples in to two classes after that considers each class as a region and calculates the average of the entire region separately. This average is centres of the region which is used for the purpose of clustering by using FCM algorithm. Once clustering process over labelling of semi supervised data is done, then whole samples would be classify by (DIBNNFC). The method proposes here is exhaustively tested with different benchmark datasets and it is found that, on increasing value of training parameters number of hidden neurons and training time both are getting decrease. The result reported, using real character recognition data set and result will compare with existing semi-supervised classifier, the proposed approach learned with semi-supervised leads to higher classification accuracy.
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
In this lecture, we talk about two different discriminative machine learning methods: decision trees and k-nearest neighbors. Decision trees are hierarchical structures.k-nearest neighbors are based on two principles: recollection and resemblance.
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
In this experiment, I tried to implement Minimum
error rate classifier using the posterior probabilities which
uses Normal distribution to calculate likelihood probabilities to
classify given sample points
Multiclass Recognition with Multiple Feature Treescsandit
This paper proposes a multiclass recognition scheme which uses multiple feature trees with an
extended scoring method evolved from TF-IDF. Feature trees consisting of different feature
descriptors such as SIFT and SURF are built by the hierarchical k-means algorithm. The
experimental results show that the proposed scoring method combing with the proposed
multiple feature trees yields high accuracy for multiclass recognition and achieves significant
improvement compared to methods using a single feature tree with original TF-IDF.
Soft computing is likely to play aprogressively important role in many applications including image enhancement. The paradigm for soft computing is the human mind. The soft computing critique has been particularly strong with fuzzy logic. The fuzzy logic is facts representationas a
rule for management of uncertainty. Inthis paperthe Multi-Dimensional optimized problem is addressed by discussing the optimal thresholding usingfuzzyentropyfor Image enhancement. This technique is compared with bi-level and multi-level thresholding and obtained optimal
thresholding values for different levels of speckle noisy and low contrasted images. The fuzzy entropy method has produced better results compared to bi-level and multi-level thresholding techniques.
Using Deep Learning to Find Similar DressesHJ van Veen
Report by Luís Mey ( https://www.linkedin.com/in/lu%C3%ADs-gustavo-bernardo-mey-97b38927/ ) on Udacity Machine Learning Course - Final Project: Use Deep Learning to Find Similar Dresses.
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
In this paper, Design and Implementation of Binary Neural Network Learning with Fuzzy
Clustering (DIBNNFC), is proposed to classify semisupervised data, it is based on the
concept of binary neural network and geometrical expansion. Parameters are updated
according to the geometrical location of the training samples in the input space, and each
sample in the training set is learned only once. It’s a semisupervised based approach, the
training samples are semi-labelled i.e. for some samples, labels are known and for some
samples data labels are not known. The method starts with classification, which is done by
using the concept of ETL algorithm. In classification process various classes are formed.
These classes classify samples in to two classes after that considers each class as a region and calculates the average of the entire region separately. This average is centres of the region which is used for the purpose of clustering by using FCM algorithm. Once clustering process over labelling of semi supervised data is done, then whole samples would be classify by (DIBNNFC). The method proposes here is exhaustively tested with different benchmark datasets and it is found that, on increasing value of training parameters number of hidden neurons and training time both are getting decrease. The result reported, using real character recognition data set and result will compare with existing semi-supervised classifier, the proposed approach learned with semi-supervised leads to higher classification accuracy.
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
In this lecture, we talk about two different discriminative machine learning methods: decision trees and k-nearest neighbors. Decision trees are hierarchical structures.k-nearest neighbors are based on two principles: recollection and resemblance.
This work is proposed the feed forward neural network with symmetric table addition method to design the
neuron synapses algorithm of the sine function approximations, and according to the Taylor series
expansion. Matlab code and LabVIEW are used to build and create the neural network, which has been
designed and trained database set to improve its performance, and gets the best a global convergence with
small value of MSE errors and 97.22% accuracy.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
Decision making both on individual and organizational level is always accompanied by the search of other’s opinion on the same. With tremendous establishment of opinion rich resources like, reviews, forum discussions, blogs, micro-blogs, Twitter etc provide a rich anthology of sentiments. This user generated content can serve as a benefaction to market if the semantic orientations are deliberated. Opinion mining and sentiment analysis are the formalization for studying and construing opinions and sentiments. The digital ecosystem has itself paved way for use of huge volume of opinionated data recorded. This paper is an attempt to review and evaluate the various techniques used for opinion and sentiment analysis.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...ijfls
A support vector machine (SVM) learns the decision surface from two different classes of the input points. In several applications, some of the input points are misclassified and each is not fully allocated to either of these two groups. In this paper a bi-objective quadratic programming model with fuzzy parameters is utilized and different feature quality measures are optimized simultaneously. An α-cut is defined to transform the fuzzy model to a family of classical bi-objective quadratic programming problems. The weighting method is used to optimize each of these problems. For the proposed fuzzy bi-objective quadratic programming model, a major contribution will be added by obtaining different effective support vectors due to changes in weighting values. The experimental results, show the effectiveness of the α-cut with the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions. The main contribution of this paper includes constructing a utility function for measuring the degree of infection with coronavirus disease (COVID-19).
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...gerogepatton
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
FIDUCIAL POINTS DETECTION USING SVM LINEAR CLASSIFIERScsandit
Currently, there is a growing interest from the scientific and/or industrial community in respect
to methods that offer solutions to the problem of fiducial points detection in human faces. Some
methods use the SVM for classification, but we observed that some formulations of optimization
problems were not discussed. In this article, we propose to investigate the performance of
mathematical formulation C-SVC when applied in fiducial point detection system. Futhermore,
we explore new parameters for training the proposed system. The performance of the proposed
system is evaluated in a fiducial points detection problem. The results demonstrate that the
method is competitive.
SPEECH CLASSIFICATION USING ZERNIKE MOMENTScscpconf
Speech recognition is very popular field of research and speech classification improves the performance
for speech recognition. Different patterns are identified using various characteristics or features of speech
to do there classification. Typical speech features set consist of many parameters like standard deviation,
magnitude, zero crossing representing speech signal. By considering all these parameters, system
computation load and time will increase a lot, so there is need to minimize these parameters by selecting
important features. Feature selection aims to get an optimal subset of features from given space, leading to
high classification performance. Thus feature selection methods should derive features that should reduce
the amount of data used for classification. High recognition accuracy is in demand for speech recognition
system. In this paper Zernike moments of speech signal are extracted and used as features of speech signal.
Zernike moments are the shape descriptor generally used to describe the shape of region. To extract
Zernike moments, one dimensional audio signal is converted into two dimensional image file. Then various
feature selection and ranking algorithms like t-Test, Chi Square, Fisher Score, ReliefF, Gini Index and
Information Gain are used to select important feature of speech signal. Performances of the algorithms are
evaluated using accuracy of classifier. Support Vector Machine (SVM) is used as the learning algorithm of
classifier and it is observed that accuracy is improved a lot after removing unwanted features.
Similar to Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control, k-means algorithm. (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
2. 86 Computer Science & Information Technology (CS & IT)
3) Multi Layer Perceptron (MLP)
4) Clustering Classifier
In this paper, I have categorized the work done for feature extraction and classification in opinion
mining and sentiment analysis. We have also tried to find out the performance, advantages and
disadvantages of different techniques.
2. DATA SETS
This section provides brief details of datasets used by us in our experiments.
2.1 Product Review Dataset
The following dataset is taken from the work of blitzer and is of multi-domain characteristics. It
consists of product reviews taken from amazon.com which belongs to a total of 25 categories like
toys, videos, reviews etc. From the randomly selected five domains 4000 +ve and 4000-ve re-
views are randomly samples.
2.2 Movie Review Dataset
The second dataset is taken from the work of pang & lee (2004). It contains movie review with of
the feature of 1000+ve and 1000-ve processed movie reviews.
Table 1: Result of Simple n-gram
Movie Review Product Reviews
Unigram only 64.1 42.91
Bigram 76.15 69.62
Trigram 76.1 71.37
(uni+bi) gram 77.15 72.94
(uni+bi+tri)
gram
80.15 78.67
3. CLASSIFICATION TECHNIQUES
3.1 Naïve Bayes Classifier
This classifier is based on the probability statement that was given by Bayes. This theorem pro-
vides conditional probability of occurrence of event E1 when E2 has already occurred, the vice
versa can also be calculated by following mathematical statement.
2
112
21
E
)E(P)E|E(P
)E|E(P =
This basically helps in deciding the polarity of data in which opinions / reviews / arguments can
be classified as positive or negative which is facilitated by collection of positive or negative ex-
amples already fed.
Naïve Bayes algorithm is implemented to estimate the probability of a data to be negative or posi-
tive. The aforesaid probability is calculated by studying positive and negative examples & then
calculating the frequency of each pole which is technically termed as learning. This learning is
actually supervised as there is an existence of examples1
. Thus, the conditional probability of a
word with positive or negative meaning is calculated in view of a plethora of positive and negative ex-
amples & calculating the frequency of each of class.
3. Computer Science & Information Technology (CS & IT) 87
)Sentence(P
)Sentiment|Sentence(P)Sentiment(P
)Sentence|Sentiment(P =
So,
WordofnosTotalclassatobelongingwordsofNumber
1classinoccurencewordofNumber
)Sentiment|Word(P
+
+
= 1
Algorithm is shown in figure 2:
Algorithm has following steps
S1: Initialize P(positive) ← num − popozitii (positive)/ num_total_propozitii
S2: Initialize P(negative) ← num − popozitii (negative) / num_total_propozitii
S3: Convert sentences into words
for each class of {positive, negative}:
for each word in {phrase}
P(word | class) < num_apartii (word | class) 1 | num_cuv (class) +
num_total_cuvinte
P (class) ←P (class) * P (word | class)
Returns max {P(pos), P(neg)}
3.1.1. Evaluation of Algorithm
The measures used for algorithm evaluation are:
Accuracy
Precision
Recall
Relevance
Contingency table for analysis of algorithm:
Relevant Irrelevant
Detected Opinions True Positive (tp) False Positive (fp)
Undetected Opinions False Negative (fn) True Negative (tn)
Now, Precision =
fp+tp
tp
Accuracy = F,
fn+fp+tn+tp
tn+tp
=
callRe+ecisionPr
callRe*ecisionPr*2
; Recall =
fn+tp
tp
1
The concept is to actually a fact that each & every word in a label is already referred in learning set.
Classifier
Classifier
Training Set
+ve Sentence
−ve Sentence
Classifier
Sentence
Review
Book Review
Figure 2
4. 88 Computer Science & Information Technology (CS & IT)
3.1.2. Accuracy
[1] Ion SMEUREANU, Cristian BUCUR, training the naïve gauss algorithm on 5000 sentences
and get 0.79939209726444 accuracy where no of groups (n) is 2.
3.1.3. Advantages of Naïve Bayes Classification Methods
1. Model is easy to interpret
2. Efficient computation.
3.1.4. Disadvantage of Naïve Bayes Classification Methods
Assumptions of attributes being independent, which may not be necessarily valid.
3.2 Support Vector Machine (SVM)
The basic goal of support vector machine is to search a decision boundary between two classes
that is excellently far away from any point in the training data2
.
SVM develops a hyper planes or a set of hyper planes in infinite dimension space. This distance
from decision surface to closest data point determines the margin of classifier. So the hyper
planes act as decision surface which act as criteria to decide the distance of any data point from it.
The margin of classifier is calculated by the distance from the closest data point. This success-
fully creates a classification but a slight error will not cause a misclassification.
For a training data set D, a set of n points:
( ) { }{ } )1.......(1,1¬εc,Rεxc,x=D
n
1=ii
p
iii
Where, xi is a p-dimensional real vector. Find the maximum-margin hyper plane i.e. splits the
points having ci = 1 from those having ci = -1. Any hyperplane can be written as the set of points
satisfying:
)........(21=b-x•w
Finding a maximum margin hyperplane, reduces to finding the pair w and b, such that the dis-
tance between the hyperplanes is maximal while still separating the data. These hyperplanes (fig.
1) are described by:
1b-xw =• and 1-b-xw =•
The distance between two hyperplanes is
w
b
and therefore w needs to be minimized. Or we
can say that minimize w in w, b subject to ci (w.xi - b) ≥ 1 for any i = 1, … , n.
Using Lagranges multipliers (αi) this optimization problem can be expressed as:
Input Space Feature Space
Figure 3: Principle of Support Vector Machine
φ
Separating
Hyperplane
5. Computer Science & Information Technology (CS & IT) 89
( )3.....}]1-)b-x.w(c[α-w
2
1
{
αb,w
maxmin
∑
n
1=i
iii
2
2
3.2.1. Extensions of SVM
To make SVM more robust and also more adaptable to real world problems, it has some exten-
sions, some of which include following:
1. Soft Margin Classification
For very high dimensional problems, common in text classification, sometimes, data are linearly
separable. For multidimensional problems like in classification of text, data are linearly separable
sometimes. But in most cases, the opinion solution is the one that classifies most of the data and
ignore some outliers or noisy data. If the training set D cannot be separated clearly then the solu-
tion is to have fat decision classifiers and make some mistake.
Mathematically, a stack variable ξi are introduced that are not equal to zero which allow xi to not
meet the margin requirements with a cost i.e., proportional to ξ.
2. Non-linear Classification
The basic linear SVM was given by Vapnik (1963) later on Bernhard Boser, Isabelle Guyon and
Vapnik in 1992 pavd a way for non-linear classifiers by using kernel to max. margin hyper
planes. The only difference from th kernel trick given by Aizerman is that every dot product is
replaced by non-linear kernel function.
The effectiveness of SVM in this case lies in the selection of the kernel and soft margin parame-
ters.
3. Multiclass SVM
Generally SVM is applicable for two class tasks. But multiclass problems can be deal with multi-
class SVM. In this case labels are designed to instances which are drawn from a finite set of vari-
ous elements. These binary classifiers can be built by two classifiers like by either distinguish one
versus all labels or between every pair of classes one versus one.
3.2.2. Accuracy
Usenet reviews were classified by pang by using numerical ratings which were accompanied as
basic truth. Various learning methods are used but unigrams gave best outputs in a presence based
frequency model run by SVM. The accuracy undergone is the process was 82.9%.
3.2.3. Advantages of Support Vector Machine Method
1. Very good performance on experimental results
2. Low dependency on data set dimensionality.
3.2.4. Disadvantages of Support Vector Machine Method
1. Categorical or missing values need to be pre-processed.
2. Difficult interpretation of resulting model.
2
Possibly discounting some points as outliers or noise.
6. 90 Computer Science & Information Technology (CS & IT)
3.3 Multi-Layer Perceptron (MLP)
MLP is a neural network which is feed forward with one or more layers between input and out-
put. Feed forward implies that, data flows in one direction i.e., from input layer to output layer
(i.e., in forward direction). This ANN which multilayer perceptron starts with input layer where
each node or neuron means a predicator variable. Input neurons are connected with each neuron
in hidden layers. The neurons in hidden layer are in turn connected to neuron in other hidden lay-
ers. The output layer is made up of one neuron in case of binary prediction or more than one neu-
ron in case of non-binary prediction. Such arrangement makes a streamlined flow of information
from input layer to output layer.
MLP technique is quite popular owing to the fact that it can act as universal function approxima-
tor. A “back propagation” network has at least one hidden layer with many non-linear units that
can learn any function or relationship between group of input variable whether discrete and for
continuous and output variable whether discrete and for continuous. This makes the technique of
MLP quite general, flexible ad non-linear tools.
When output layers is to be classified that has total number of nodes as total number of classes
and the node having highest value then it gives the output i.e., estimate of a class for which an in-
put is made. If there is a special case of two classes than, generally there is a node in output layer
and classification is carried between two classes and is done by applying cut off point to node
value.
An advantages of this technique, compared to classical modelling techniques, is that it does not
impose any sort of restriction with respect to the starting data (type of functional relationship be-
tween variables), neither does it usually start from specific assumptions (like the type of distribu-
tion the data follow). Another virtue of the technique lies in its capacity to estimate good models
even despite the existence of noise in the information analyzed, as occurs when there is a pres-
ence of omitted values or outlier values in the distribution of the variables. Hence, it is a robust
technique when dealing with problems of noise in the information presented; however, this does
not mean that the cleaning criteria of the data matrix should be relaxed.
3.3.1. Accuracy
Ludmila I. Kuncheva, Member, IEEE on health care data calculated accuracy of Multilayer perceptron
(MLP) as 84.25–89.50%.
3.3.2. Advantages of MLP
1) Capable of acting as a universal function approximator.
2) Capability to learn almost any relationship between input and output variables.
3.3.3. Disadvantages of MLP
1) Flexibility lies in the need to have sufficient training data and that it requires more time for its execution
than other techniques.
2) It is somewhat considered as complex “black box”.
3.4 Clustering Classifier
To identifying the prominent features of human and objects and recognizing them with a type one
can require the object clustering.
Basically clustering is unsupervised learning technique.
7. Computer Science & Information Technology (CS & IT) 91
The objective of clustering is to determine a fresh or different set
of classes or categories, the new groups are of concern in them-
selves, and their valuation is intrinsic. In this method, data objects
or instances groups into subset in such a method that similar in-
stances are grouped together and various different instances be-
long to the different groups.
Clustering is an unsupervised learning task, so no class values
representing a former combination of the data instances are given
(situation of supervised learning).
Clustering basically assembles data objects or instances into subset in such a method that similar
objects are assembled together, while different objects belong to different groups. The objects are
thereby prepared or organized into efficient representations that characterize the population being
sampled.
Clustering organization is denoted as a set of subsets C = C1 . . . Ck of S, such that:
φ=∩=
= ji
k
1i i CCandCS U for ji ≠ . Therefore, any object in S related to exactly one and only one
subset.
For example, consider the figure 4 where data set has three normal clusters.
Now consider the some real-life examples for illustrating clustering:
Example 1: Consider the people having similar size together to make small and large shirts.
1. Tailor-made for each person: expensive
2. One-size-fits-all: does not fit all.
Example 2: In advertising, segment consumers according to their similarities: To do targeted ad-
vertising.
Example 3: To create a topic hierarchy, we can take a group of text and organize those texts ac-
cording to their content matches.
There are two main types of measures used to estimate this relation: distance measures and simi-
larity measures.
Basically following are two kinds of measures used to guesstimate this relation
1. Distance measures and
2. Similarity measures
Distance Measures
To conclude the similarity and difference between the pair of instances, many clustering ap-
proaches usage the distance measures.
It is convenient to represent the distance between two instances let say xi and xj as: d (xi,xj). A
valid distance measure should be symmetric and gains its minimum value (usually zero) in case
of identical vectors.
Distance measures method is known as metric distance measure if follow the following proper-
ties:
Figure 4
8. 92 Computer Science & Information Technology (CS & IT)
S∈x,x∀
x=x⇒0=)x,x(d.2
S∈x,x,x∀
)x,x(d+)x,x(d≤)x,x(dinequalityTriangle.1
ji
jiji
kji
kjjiki
There are variations in distance measures depending upon the attribute in question.
3.4.1. Clustering Methods
A number of clustering algorithms are getting popular. The basic reason of a number of clustering
methods is that “cluster” is not accurately defined (Estivill-Castro, 2000). As a result many clus-
tering methods have been developed, using a different induction principle. Farley and Raftery
(1998) gave a classification of clustering methods into two main groups: hierarchical and parti-
tioning methods. Han and Kamber (2001) gave a different categorization: density-based methods,
model-based clustering and gridbased methods. A totally different categorization based on the in-
duction principle is given by (Estivill-Castro, 2000).
1. Hierarchical Methods
According to this method clusters are created by recursive partitioning of the instances in either a
top-down or bottom-up way.
2. Partitioning Methods
Here, starting from the first (starting) partition, instances are rearranged by changing the position
of instances from one cluster to another cluster.
The basic assumption is that the number of clusters will be set by the user before. In order to get
global optimality in partitioned-based clustering, an exhaustive enumeration process of all possi-
ble partitions is done.
3. Density-based Methods
The basis of this method is probability. It suggests that the instances that belong to each cluster
are taken from a specific probability distribution. The entire distribution of the data is supposed to
be a combination of numerous disseminations. The objective of these methods is to identify the
clusters and their distribution parameters. This particular method is designed in order to discover
clusters of arbitrary shape which are not necessarily convex.
4. Grid-based Methods
In grid-based method all the operations for clustering are performed in a grid structure and for
performing in the grid structure all the available space is divided into a fixed number of cells.
The basic main advantage of this method is its speed (Han and Kamber, 2001).
5. Soft-computing Methods
This includes techniques like that of neural networks.
3.4.2. Evaluation Criteria Measures for Clustering Technique
Generally, these criteria are split into two groups named Internal and External.
1. Internal Quality Criteria
This criteria generally measures compactness of clusters using similarity measures. It generally
takes into consideration intra-cluster homogeneity, the inter-cluster separability or a combination
of these two. It doesn’t use any exterior information beside the data itself.
9. Computer Science & Information Technology (CS & IT) 93
2. External Quality Criteria
They are useful for examining the structure of the clusters match to some already defined classifi-
cation of the objects.
3.4.3. Accuracy
Several different classifiers were used and the accuracy of the best classifier varied from 99.57% to
65.33%, depending on the data.
3.4.4. Advantages of Clustering Method
The main advantage of this method is that it offers the groups that (approximately) fulfil an opti-
mality measure.
3.4.5. Disadvantages of Clustering Method
1. There is no learning set of labelled observations.
2. Number of groups is usually unknown.
3. Implicitly, users already chooses the appropriate features and distance measure.
4. CONCLUSION
The important part to gather information always seems as what the people think. The rising ac-
cessibility of opinion rich resources such as online analysis websites and blogs rises as one can
simply search and recognize the opinions of some other one. One can precise his/her ideas and
opinions concerning goods and facilities. These view and thoughts are subjective figures which
signifies someone opinions, sentiments, emotional state or evaluation. In this paper, we present
different methods for data (feature or text) extraction and every method have some benefits and
limitations and one can use these methods according to the situation for feature and text extrac-
tion.
Based on the survey we can find the accuracy of different methods in different data set using n-
gram feature shown in table 2.
Table 2: Accuracy of Different Methods
Movie Reviews Product Reviews
Ngram
Feature
NB MLP SVM NB MLP SVM
75.50 81.05 81.15 62.50 79.27 79.40
According to our survey, accuracy of MLP is better than other three methods when we use Ngram
feature.
The four methods discussed in the paper are actually applicable in different areas like clustering is
applied in movie reviews and SVM techniques is applied in biological reviews & analysis. Al-
though in the field of opinion mining is new, but still the diverse methods available to provide a
way to implement these methods in various programming languages like PHP, Python etc. with
an outcome of innumerable applications. From a convergent point of view Naïve Bayes is best
suitable for textual classification, clustering for consumer services and SVM for biological read-
ing and interpretation.
10. 94 Computer Science & Information Technology (CS & IT)
ACKNOWLEDGEMENTS
Every good writing requires the help and support of many people for it to be truly good. I would
take the opportunity of thanking all those who extended a helping hand whenever I needed one.
I offer my heartfelt gratitude to Mr. Mohd. Shahid Husain who encouraged, guided and helped
me a lot in the project. I extent my thanks to Miss. Ratna Singh (fiancee) for her incandescent
help to complete this paper.
A vote of thanks to my family for their moral and emotional support. Above all utmost thanks to
the Almighty God for the divine intervention in this academic endeavour.
REFERENCES
[1] Ion SMEUREANU, Cristian BUCUR, Applying Supervised Opinion Mining Techniques on Online
User Reviews, Informatica Economică vol. 16, no. 2/2012.
[2] Bo Pang and Lillian Lee, “Opinion Mining and Sentiment Analysis”, Foundations and TrendsR_ in
Information Retrieval Vol. 2, Nos. 1–2 (2008).
[3] Abbasi, “Affect intensity analysis of dark web forums,” in Proceedings of Intelligence and Security
Informatics (ISI), pp. 282–288, 2007.
[4] K. Dave, S. Lawrence & D. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic
Classi_cation of Product Reviews." Proceedings of the 12th International Conference on World Wide
Web, pp. 519-528, 2003.
[5] B. Liu. Web Data Mining: Exploring hyperlinks, contents, and usage data," Opinion Mining.
Springer, 2007.
[6] B. Pang & L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with re-
spect to rating scales." Proceedings of the Association for Computational Linguistics (ACL), pp.
15124,2005.
[7] Nilesh M. Shelke, Shriniwas Deshpande, Vilas Thakre, Survey of Techniques for Opinion Mining,
International Journal of Computer Applications (0975 – 8887) Volume 57– No.13, November 2012.
[8] Nidhi Mishra and C K Jha, Classification of Opinion Mining Techniques, International Journal of
Computer Applications 56 (13):1-6, October 2012, Published by Foundation of Computer Science,
New York, USA.
[9] Oded Z. Maimon, Lior Rokach, “Data Mining and Knowledge Discovery Handbook” Springer, 2005.
[10] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Sentiment classification using machine learn-
ing techniques.” In Proceedings of the 2002 Conference on Empirical Methods in Natural Language
Processing (EMNLP), pages 79–86.
[11] Towards Enhanced Opinion Classification using NLP Techniques, IJCNLP 2011, pages 101–107,
Chiang Mai, Thailand, November 13, 2011
Author
Pravesh Kumar Singh is a fine blend of strong scientific orientation and editing.
He is a Computer Science (Bachelor in Technology) graduate from a renowned
gurukul in India called Dr. Ram Manohar Lohia Awadh University with excel-
lence not only in academics but also had flagship in choreography. He mastered
in Computer Science and engineering from Integral University, Lucknow, India.
Currently he is acting as Head MCA (Master in Computer Applications) de-
partment in Thakur Publications and also working in the capacity of Senior Edi-
tor.