Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
Classification-based Retrieval Methods to Enhance Information Discovery on th...IJMIT JOURNAL
The widespread adoption of the World-Wide Web (the Web) has created challenges both for society as a whole and for the technology used to build and maintain the Web. The ongoing struggle of information retrieval systems is to wade through this vast pile of data and satisfy users by presenting them with information that most adequately it’s their needs. On a societal level, the Web is expanding faster than we can comprehend its implications or develop rules for its use. The ubiquitous use of the Web has raised important social concerns in the areas of privacy, censorship, and access to information. On a technical level, the novelty of the Web and the pace of its growth have created challenges not only in the development of new applications that realize the power of the Web, but also in the technology needed to scale applications to accommodate the resulting large data sets and heavy loads. This thesis presents searching algorithms and hierarchical classification techniques for increasing a search service's understanding of web queries. Existing search services rely solely on a query's occurrence in the document collection to locate relevant documents. They typically do not perform any task or topic-based analysis of queries using other available resources, and do not leverage changes in user query patterns over time. Provided within are a set of techniques and metrics for performing temporal analysis on query logs. Our log analyses are shown to be reasonable and informative, and can be used to detect changing trends and patterns in the query stream, thus providing valuable data to a search service.
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort
required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is
available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also.
Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and
classification. This task asked participants to design classifiers for identifying documents containing speech
related information in the main literature, and evaluated them against one another. Expected systems utilizes a
novel approach of k -nearest neighbour classification and compare its performance by taking different values of
k.
Data and Information Integration: Information ExtractionIJMER
Information extraction is generally concerned with the location of different items in any document, may be textual or web document. This paper is concerned with the methodologies and applications of information extraction. The field of information extraction plays a very important role in the natural language processing community. The architecture of information extraction system which acts as the base for all languages and fields is also discussed along with its different components. Information is hidden in the large volume of web pages and thus it is necessary to extract useful information from the web content, called Information Extraction. In information extraction, given a sequence of instances, we identify and pull out a sub-sequence of the input that represents information we are interested in.
Manual data extraction from semi supervised web pages is a difficult task. This paper focuses on study of various data extraction techniques and also some web data extraction techniques. In the past years, there was a rapid expansion of activities in the information extraction area. Many methods have been proposed for automating the process of extraction. We will survey various web data extraction tools. Several real-world applications of information extraction will be introduced. What role information extraction plays in different fields is discussed in these applications. Current challenges being faced by the available information extraction techniques are briefly discussed along with the future work going on using the current researches is discussed.
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
2. an efficient approach for web query preprocessing edit satIAESIJEECS
The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
Supplier ranking and selection in a bakeryIOSR Journals
Supplier evaluation and selection is a fundamental problem in supply chain management. Many
companies may not know how to evaluate proposed suppliers to integrate the different criteria upon which they
want to make their decision. A number of techniques have been employed to solve this problem but were not
able tosufficiently incorporate qualitative criteria into consideration for estimation of their alternatives.
A multi-criteria decision-making (MCDM) methodology, the Analytical Hierarchy Process (AHP), which takes
into consideration both quantitative and qualitative criteria, was used to evaluate three suppliers of improvers
(a major ingredient) for bread production in a bakery,UIB, in southwest Nigeria. This method uses a ranking
scale when comparing alternatives. A consistency ratio is estimated when data have been collected to check for
the consistency of judgments to ensure an accurate result is obtained with the method.
It was discovered that Supplier C adds more value to UIB because it had the highest priority weight of 0.346,
although it was keenly followed by suppliers B and C with weights 0.336 and 0.317 respectively.
Results showed that each supplier fared well under one criteria or the other and there was a generally
good performance from all suppliers.
Rehabilitation Process and Persons with Physical DysfunctionsIOSR Journals
Abstract: The main purpose of this study is to determine rehabilitation process and persons with physical
dysfunctions. To achieve the purpose of this study, three hypotheses were formulated. Ex-post facto research
design was adopted for the study. A sample of one hundred persons with disabilities was randomly selected for
the study. The selection was done through the simple random sampling technique. This was to give equal and
independent opportunity to all the respondents to be selected for the study. The questionnaire was the major
instrument used for data collection. The instrument was subjected to both face and content validation by expert
in measurement and evaluation. The reliability estimate of the instrument was established through the test-retest
reliability method Pearson product correlation analysis and independent t-test were employed were adopted to
test the hypotheses at .05 level of significance. The result of the analysis reveals that rehabilitation significantly
relates with persons with orthopedic and neurological impairments. The result also revealed that there is a
significant difference between male and female disabled persons in their perception of rehabilitation of persons
with other health impairments.
Keywords: Rehabilitation process, persons, physical, dysfunctions.
Uniform particle distribution by a newer method in composite metal of Al/SiCIOSR Journals
Abstract: Preparation of composites of metal with ceramic particle reinforced through the casting process
is not uniform because of poor wet ability. The major difficulty is to get a uniform distribution of
reinforcement especially in higher volume fractions. An innovative method of producing cast composites is
tried in present study to overcome this problem we need homogeneity of matrix. The method involves multi
axis rotation of liquid aluminum and silicon carbide particulates packed in a steel pipe inside a rotating drum.
Up to 65 % volume of the metal (aluminum)is incorporated by SIC by this technique. Physical Properties
like hardness, micro hardness, densities and microstructures have been studied. The distribution of
particles as well the mechanical properties are better as compared to that of stir cast composites with similar
volume fraction of silicon carbide reinforcement. The composite with 65-volume percentage of silicon carbide
of particulates showed a Rockwell Hardness value of 67Rb.In few locations the microstructure showed a
non-uniform distribution which can be neglected . There were segregation of silicon carbide particles at a
particular location and the hardness obtained there was much higher. The particle distribution is a result
of the combined influence of random mixing of particles and liquid aluminum and the solidification pattern
obtained.
Key word: Multi axis rotation, microstructure, MMC, Al- SIC matrix
Online Technology and Marketing of Financial Services in Nigeria: An Impact A...IOSR Journals
This study extracts and analyse basic information from Service providers and customers reporting the performance and challenges of online marketing of financial services in Nigeria. Staff and customers of twelve (12) out of the leading financial institutions in Nigeria, with track records of online activities were purposively sampled with the aid of structured questionnaires, to elicit ex post facto facts for use in this study. The resultant data were analysed with the use of descriptive statistics and Pearson Correlation Coefficient using statistical package for social sciences (SPSS), version 16. The study reveals low impact; meaning the country has not realises her full potentials in harnessing online technology for marketing of financial services. It also reveals incidence of challenges associated with implementation and performance of online marketing of financial services in Nigeria which include fluctuation in online services due to poor infrastructure, high rate of online fraudsters, slim awareness on the part of the customers, e-payment bottleneck, among others. Finally, this study expounds a number of management options to stimulate the impacts of online technology in marketing financial services in Nigeria.
Sentimental classification analysis of polarity multi-view textual data using...IJECEIAES
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort
required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is
available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also.
Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and
classification. This task asked participants to design classifiers for identifying documents containing speech
related information in the main literature, and evaluated them against one another. Expected systems utilizes a
novel approach of k -nearest neighbour classification and compare its performance by taking different values of
k.
Data and Information Integration: Information ExtractionIJMER
Information extraction is generally concerned with the location of different items in any document, may be textual or web document. This paper is concerned with the methodologies and applications of information extraction. The field of information extraction plays a very important role in the natural language processing community. The architecture of information extraction system which acts as the base for all languages and fields is also discussed along with its different components. Information is hidden in the large volume of web pages and thus it is necessary to extract useful information from the web content, called Information Extraction. In information extraction, given a sequence of instances, we identify and pull out a sub-sequence of the input that represents information we are interested in.
Manual data extraction from semi supervised web pages is a difficult task. This paper focuses on study of various data extraction techniques and also some web data extraction techniques. In the past years, there was a rapid expansion of activities in the information extraction area. Many methods have been proposed for automating the process of extraction. We will survey various web data extraction tools. Several real-world applications of information extraction will be introduced. What role information extraction plays in different fields is discussed in these applications. Current challenges being faced by the available information extraction techniques are briefly discussed along with the future work going on using the current researches is discussed.
Novel Database-Centric Framework for Incremental Information Extractionijsrd.com
Information extraction (IE) has been an active research area that seeks techniques to uncover information from a large collection of text. IE is the task of automatically extracting structured information from unstructured and/or semi structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in document processing like automatic annotation and content extraction could be seen as information extraction. Many applications call for methods to enable automatic extraction of structured information from unstructured natural language text. Due to the inherent challenges of natural language processing, most of the existing methods for information extraction from text tend to be domain specific. In this project a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time and there is a mechanism to generate extraction queries from both labeled and unlabeled data. Query generation is critical so that casual users can specify their information needs without learning the query language.
2. an efficient approach for web query preprocessing edit satIAESIJEECS
The emergence of the Web technology generated a massive amount of raw data by enabling Internet users to post their opinions, comments, and reviews on the web. To extract useful information from this raw data can be a very challenging task. Search engines play a critical role in these circumstances. User queries are becoming main issues for the search engines. Therefore a preprocessing operation is essential. In this paper, we present a framework for natural language preprocessing for efficient data retrieval and some of the required processing for effective retrieval such as elongated word handling, stop word removal, stemming, etc. This manuscript starts by building a manually annotated dataset and then takes the reader through the detailed steps of process. Experiments are conducted for special stages of this process to examine the accuracy of the system.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
Text preprocessing is a vital stage in text classification (TC) particularly and text mining generally. Text preprocessing tools is to reduce multiple forms of the word to one form. In addition, text preprocessing techniques are provided a lot of significance and widely studied in machine learning. The basic phase in text classification involves preprocessing features, extracting relevant features against the features in a database. However, they have a great impact on reducing the time requirement and speed resources needed. The effect of the preprocessing tools on English text classification is an area of research. This paper provides an evaluation study of several preprocessing tools for English text classification. The study includes using the raw text, the tokenization, the stop words, and the stemmed. Two different methods chi-square and TF-IDF with cosine similarity score for feature extraction are used based on BBC English dataset. The Experimental results show that the text preprocessing effect on the feature extraction methods that enhances the performance of English text classification especially for small threshold values.
Supplier ranking and selection in a bakeryIOSR Journals
Supplier evaluation and selection is a fundamental problem in supply chain management. Many
companies may not know how to evaluate proposed suppliers to integrate the different criteria upon which they
want to make their decision. A number of techniques have been employed to solve this problem but were not
able tosufficiently incorporate qualitative criteria into consideration for estimation of their alternatives.
A multi-criteria decision-making (MCDM) methodology, the Analytical Hierarchy Process (AHP), which takes
into consideration both quantitative and qualitative criteria, was used to evaluate three suppliers of improvers
(a major ingredient) for bread production in a bakery,UIB, in southwest Nigeria. This method uses a ranking
scale when comparing alternatives. A consistency ratio is estimated when data have been collected to check for
the consistency of judgments to ensure an accurate result is obtained with the method.
It was discovered that Supplier C adds more value to UIB because it had the highest priority weight of 0.346,
although it was keenly followed by suppliers B and C with weights 0.336 and 0.317 respectively.
Results showed that each supplier fared well under one criteria or the other and there was a generally
good performance from all suppliers.
Rehabilitation Process and Persons with Physical DysfunctionsIOSR Journals
Abstract: The main purpose of this study is to determine rehabilitation process and persons with physical
dysfunctions. To achieve the purpose of this study, three hypotheses were formulated. Ex-post facto research
design was adopted for the study. A sample of one hundred persons with disabilities was randomly selected for
the study. The selection was done through the simple random sampling technique. This was to give equal and
independent opportunity to all the respondents to be selected for the study. The questionnaire was the major
instrument used for data collection. The instrument was subjected to both face and content validation by expert
in measurement and evaluation. The reliability estimate of the instrument was established through the test-retest
reliability method Pearson product correlation analysis and independent t-test were employed were adopted to
test the hypotheses at .05 level of significance. The result of the analysis reveals that rehabilitation significantly
relates with persons with orthopedic and neurological impairments. The result also revealed that there is a
significant difference between male and female disabled persons in their perception of rehabilitation of persons
with other health impairments.
Keywords: Rehabilitation process, persons, physical, dysfunctions.
Uniform particle distribution by a newer method in composite metal of Al/SiCIOSR Journals
Abstract: Preparation of composites of metal with ceramic particle reinforced through the casting process
is not uniform because of poor wet ability. The major difficulty is to get a uniform distribution of
reinforcement especially in higher volume fractions. An innovative method of producing cast composites is
tried in present study to overcome this problem we need homogeneity of matrix. The method involves multi
axis rotation of liquid aluminum and silicon carbide particulates packed in a steel pipe inside a rotating drum.
Up to 65 % volume of the metal (aluminum)is incorporated by SIC by this technique. Physical Properties
like hardness, micro hardness, densities and microstructures have been studied. The distribution of
particles as well the mechanical properties are better as compared to that of stir cast composites with similar
volume fraction of silicon carbide reinforcement. The composite with 65-volume percentage of silicon carbide
of particulates showed a Rockwell Hardness value of 67Rb.In few locations the microstructure showed a
non-uniform distribution which can be neglected . There were segregation of silicon carbide particles at a
particular location and the hardness obtained there was much higher. The particle distribution is a result
of the combined influence of random mixing of particles and liquid aluminum and the solidification pattern
obtained.
Key word: Multi axis rotation, microstructure, MMC, Al- SIC matrix
Online Technology and Marketing of Financial Services in Nigeria: An Impact A...IOSR Journals
This study extracts and analyse basic information from Service providers and customers reporting the performance and challenges of online marketing of financial services in Nigeria. Staff and customers of twelve (12) out of the leading financial institutions in Nigeria, with track records of online activities were purposively sampled with the aid of structured questionnaires, to elicit ex post facto facts for use in this study. The resultant data were analysed with the use of descriptive statistics and Pearson Correlation Coefficient using statistical package for social sciences (SPSS), version 16. The study reveals low impact; meaning the country has not realises her full potentials in harnessing online technology for marketing of financial services. It also reveals incidence of challenges associated with implementation and performance of online marketing of financial services in Nigeria which include fluctuation in online services due to poor infrastructure, high rate of online fraudsters, slim awareness on the part of the customers, e-payment bottleneck, among others. Finally, this study expounds a number of management options to stimulate the impacts of online technology in marketing financial services in Nigeria.
Natural Radioactivity Measurements of Basalt Rocks in Aden governorate, South...IOSR Journals
The amounts of radioactivity in the igneous rocks have been investigated; 63 basalt rock samples were collected from Aden governorate, South of Yemen. The activity concentration of 226Ra, 232Th and 40K were measured using NaI (TI) detector. Along the study area the radium equivalent activities Raeq in Bq/Kg of samples under investigation were found in the range of 51.60to 809.26Bq/Kg with an average value of 237.01Bq/Kg, this value is below the internationally accepted value of 370 Bq/Kg. To estimate the health effects of this natural radioactive composition, the average values of absorbed gamma dose rate D (55 nGyh-1), Indoor and outdoor annual effective dose rates Eied (0.11 mSvy-1), and Eoed (0.03 mSvy-1), External hazard index Hex(0.138) and internal hazard index Hin (0.154), and representative level index Iγr(0.386) have been calculated and found to be higher than the worldwide average values.
Determination of nonlinear absorption (β) and refraction (n2)by the Z-scan me...IOSR Journals
Potassium Pentaborate nonlinear optical (NLO) material was synthesized by the solution growth method. The grown crystals were subjected to structural, optical and mechanical property studies. Crystal with excellent transparency were grown with maximum size of 9mm×8mm×5mm and the grown crystals were characterized by single crystal Single crystal XRD, FT-IR, TGA-DTA&DSC, and UV–vis-NIR studies. The crystal belongs to orthorhombic with a space group of mm2 having unit-cell dimensions a = 11.068Åb= 11.175Å c = 9.058Åand α = 90°; β = 90°; and γ =90°; Z=4, at 298(2) K. The second-order nonlinear optical property of the polycrystalline sample has been confirmed by Kurtz-Perry powder SHG analysis. Third order nonlinear optical properties were also studied by Z-scan techniques. Nonlinear absorption and nonlinear refractive index were found out and the third order bulk susceptibility of compound was also calculated.
A Comparative Study of Tourism Industry in North-Eastern States of IndiaIOSR Journals
Despite of so many unique and natural beauties the North-eastern region of India is still place of least visited in India. Though the Government of India has been providing the financial assistance for development of tourist Infrastructure and promotion and marketing, the performances and development of the tourism in states is totally different. In this context, the present paper examines the tourism in North-eastern states of India. Major findings are the inflows of tourists are higher in Assam state and lower in Nagaland. Tourism sectors share in Gross State Domestic Product (GSDP) is high in Assam. As for the growth rate of tourists’ influx is concerned, the growth rate in Arunachal Pradesh is much higher with 55 percent compared to other states. Further, the Density of Tourist Population (DTP) and The Percapita Tourist arrival (PCT) is high in Sikkim with 101.57 and 1.19 respectively and lowest in Nagaland with 1.35 and 0.01. The major constraints of North-east tourism development and uneven performance among states are insufficient funds, lack of infrastructure, transportation, lack of alternatives means of transport, Marketing, boundary issues, terror effect and permit period. Steps should be taken to wipe-out the constraints for tourism development.
Effect of Image Quality Service and Schools Parents of Satisfaction in Surabaya.IOSR Journals
This study aimed to determine the effect of Quality of Service (X1), the image of School (X2) either partially or simultaneously to the satisfaction of Parents' (Y) in the School of Surabaya. This study has four hypotheses, namely: (1) there is an allegation of a positive and significant relationship between service quality in partial satisfaction of the Parents' School in Surabaya. (2) there are allegations about a positive and significant relationship between the partial image of the Parents 'Satisfaction (Y) in the School of Surabaya, (3) there is a suspicion of a positive and significant relationship between service quality and image simultaneously on Parents' Satisfaction in School Surabaya, (4) there is suspicion between the two independent variables and the image and quality of service, quality of service is the most significant effect on increasing Satisfaction Parents' School in Surabaya?
Optimization of Complete Monopole Antennato Exhibit Wideband Capabilities.IOSR Journals
Antennas used for early portable wireless handheld devices were the so-called whip antennas. The
quarter-wavelength whip antenna was very popular, mostly because it is simple and convenient. It has an Omnidirectional
pattern in the plane of the earth when held upright and a gain satisfying the device’s specifications.
New antenna designs have appeared on radios with lower profile than the whip antenna and without
significantly reducing performance. These include the quarter-wavelength helical antenna and the “stubby”
helical antenna, which is the shortest antenna available. In recent years, the demand for compact handheld
communication devices has grown significantly. Devices smaller than palm size have appeared in the market.
Antenna size is a major factor that limits device miniaturization. In the past few years, new designs based on the
Planar Inverted-F Antenna (PIFA) and Microstrip Antennas (MSA) have been popular for handheld wireless
devices because these antennas have a low profile geometry instead of protruding as most antennas do on
handheld radios. Conventional PIFAs and MSAs are compact, with a length that is approximately a quarter to a
half of the wavelength. These antennas can be further optimized by adding new parameters in the design, such
as strategically shaping the conductive plate, or judiciously locating loads.
Global Perspective and Issues Relating to Product RecallIOSR Journals
The paper aims to analyze the various issues related to a product recall in the globalized world where the market has become a single trading place. A recall acts like an alarm that makes the company realize that it is the time to take corrective action by reviewing and making changes in the required areas. It is very important to understand the various issues related to the product recall like the factors related to costing, legal framework, recall planning, and management’s perspective and so on. The paper aims to discuss the issues identified with recall with the examples of global companies that have faced a recall. Further the paper also focuses on the recall framework with respect to India. The paper has a practical implication both for the academicians and for the readers in terms of their concern with the aspect issues regarding product recall.
Sports and Conflict Prevention; the Way Forward For Global PeaceIOSR Journals
Abstract: This paper discussed sports and conflict prevention by looking at the way forward for global peace.
Generally conflict is defined as a state of disagreement between persons or group of persons. The major causes
are usually differences in opinion, prejudice and discrimination, belief, and access to valued scarce resources.
Conflict is an important part of human existence and a natural part of our daily life. Conflict can either be
positive or negative depending on how it is handled. Conflict can emanate from various sources, within us,
school, home and community or the society at large. Achieving global peace means creating peace within
individuals, communities and the society. Sports as a veritable tool for social transformation can be effectively
utilized to achieve a certain level of peace among individuals and nations. This paper focused on the possibility
of utilizing the values of sports in creating understanding, tolerance, and respect for human dignity,
development of moral values and social integration as a vehicle for creating peace among nations. If nations
can achieve these qualities, then conflict can be minimized and global peace can be guaranteed. The influence
of sports on character formation and social cohesion has direct bearing on peaceful attitudes. The development
of emotional fitness, self-esteem, need for recognition, sense of belonging and feelings of anger, hostility and
aggression are all met through participation in competitive sports. The sports environment starting with the
athletes, coaches, umpires, spectators and vendors should all portray peace and act in a manner the peace can
prevail. To prevent conflict and achieve global peace, sport managers, handlers and enthusiasts must develop
the capacity to detect conflicting situations among nations and develop strategies in sports to deal with before it
erupts.
Channel Equalization of WCDMA Downlink System Using Finite Length MMSE-DFEIOSR Journals
The performance of WCDMA system deteriorates in the presence of multipath fading environment.
Fading destroys the orthogonality and is responsible for multiple access interference (MAI). Though
conventional rake receiver provides reasonable performance in the WCDMA downlink system due to path
diversity, but it does not restores the orthogonality. Linear equalizer restores orthogonality and suppresses
MAI, but it is not efficient, since its performance depends on the spectral characteristics of the channel. To
overcome this, Minimum Mean Square Error- Decision Feedback Equalizer (MMSE-DFE) with a linear,
anticausal feedforward filter, causal feedback filter and simple detector is proposed in this paper. The filter taps
of finite length DFE is derived using the cholesky factorization theory, capable of suppressing noise,
Intersymbol Interference (ISI) and MAI. This paper describes the WCDMA downlink system using finite length
MMSE-DFE and takes into consideration the effects of interference which includes additive white gaussian
noise, multipath fading, ISI and MAI. Furthermore, the performance is compared with conventional rake
receiver and MMSE and the simulation results are shown.
Implementation of Wireless Communication using Adaptive Beamforming of Smart ...IOSR Journals
Abstract : As the growing demand for mobile communications is constantly increasing, the need for better coverage, improved capacity, and higher transmission quality rises. Thus, a more efficient use of the radio spectrum is required. A smart antenna system is capable of efficiently utilizing the radio spectrum and is an effective solution to the present wireless system problems while achieving reliable and robust high-speed, high-data-rate transmission. Smart antenna technology offer significantly improved solution to reduce interference level and improve system capacity. Smart antenna technology attempts to address this problem via advanced signal processing technology called beamforming. Adaptive beamforming is used for enhancing a desired signal while suppressing noise and interference at the output of an array of sensors. , in this paper work, a robust adaptive beamforming algorithm using smart antenna base station system, is to investigated its performance in presence of multipath components and multiple users .The capability of smart antenna and adaptive antenna easily employable to cognitive radio and OFDM system. Keywords - Smart/adaptive antenna, beamforming, DSP, OFDM
“An analytical study on medal tally of top ten countries in 2012 Olympic Game...IOSR Journals
The origin of the ancient Olympic Games is lost the midst of pre-history, but for many centuries they
were only a festival of the Greek people. The Games were first held in honour of the Greek God, Zeus in 776
BC in the plain of kingdom of Elis, nestled in lush valley between the Alpheus River and Mount Kronion, 15
km from the Ionian Sea. The Olympiad celebrated that year was considered as the first and was used to date
subsequent historic events.
A Survey of Image Segmentation based on Artificial Intelligence and Evolution...IOSR Journals
Abstract : In image analysis, segmentation is the partitioning of a digital image into multiple regions (sets of
pixels), according to some homogeneity criterion. The problem of segmentation is a well-studied one in
literature and there are a wide variety of approaches that are used. Different approaches are suited to different
types of images and the quality of output of a particular algorithm is difficult to measure quantitatively due to
the fact that there may be much correct segmentation for a single image. Image segmentation denotes a process
by which a raw input image is partitioned into nonoverlapping regions such that each region is homogeneous
and the union of any two adjacent regions is heterogeneous. A segmented image is considered to be the highest
domain-independent abstraction of an input image. Image segmentation is an important processing step in many
image, video and computer vision applications. Extensive research has been done in creating many different
approaches and algorithms for image segmentation, but it is still difficult to assess whether one algorithm
produces more accurate segmentations than another, whether it be for a particular image or set of images, or
more generally, for a whole class of images.
In this paper, The Survey of Image Segmentation using Artificial Intelligence and Evolutionary Approach
methods that have been proposed in the literature. The rest of the paper is organized as follows. 1.
Introduction, 2.Literature review, 3.Noteworthy contributions in the field of proposed work, 4.Proposed
Methodology, 5.Expected outcome of the proposed research work, 6.Conclusion.
Keywords: Image Segmentation, Segmentation Algorithm, Artificial Intelligence, Evolutionary Algorithm,
Neural Network, Fuzzy Set, Clustering.
Evaluation of and Fracture Behavior of Alloy 90 SheetsIOSR Journals
Abstract: ALLOY–90 refers to a family of austenitic nickel-based super alloys. Nimonic alloys typically consist
of roughly 80% nickel and 20% chromium with additives such as titanium and aluminium. Nickel-based
superalloys, among several high temperature structural alloys, are the prime materials for numerous advanced
high temperature structural components. Several advanced processing technologies, such as isothermal forging,
equiangle extrusion, investment casting, directional solidification and single crystal technologies, similar and
dissimilar metal joining, destructive and non- destructive testing too have also evolved.
In the past, a few attempts have been made to study the deformation behavior and Fracture behavior of the
alloy 90 sheets for sheet metal applications. However, these studies were limited to different commercial grades
such as cold rolled sheets of thicknesses upto 2 mm. None of these studies have addressed the influence of
microstructure and texture for ultra-thin sheet applications. Hence, a comprehensive study has been undertaken
to evaluate the ambient temperature deformation characteristics as a function of degree of cold rolling and
ageing.
In order to determine the tensile properties, tensile tests are conducted on the alloy-90 sheets of 1 mm
and 0.5 mm thicknesses in different heat treat conditions in different specimen orientations namely R, R+30º,
R+45º, R+60º and RT. The fracture behavior of the alloy sheets are studied to determine the mode of fracture.
present work includes comparison of tensile properties of macro and micro specimens of alloy 90 sheet and
properties evaluated in the present work include tensile flow behavior in various microstructural conditions
such as Cold rolled, Solution treated, aged for different times.
Key words: Formability, Alloy 90, Impact
Anisotropic Bianchi Type-III Dark Energy Model with Time-Dependent Decelerati...IOSR Journals
An anisotropic Bianchi type-III cosmological model is investigated in a Saez-Ballester scalar-tensor theory of gravitation. Three different time-dependent skewness parameters along spatial directions are introduced to represent the deviation of pressure from isotropy. To get deterministic solutions of the field equations, we choose variation law of scale factor 𝑆= (𝑡𝑟𝑒𝑡)1𝑙 which yields a time-dependent deceleration parameter (DP) representing a model that generates a transition of the universe from the early decelerating phase to the present accelerating phase. Some physical and geometrical properties of the model are also discussed.
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...IOSR Journals
In wireless communication system transmitted signals are subjected to multiple reflections,
diffractions and attenuation caused by obstacles such as buildings and hills, etc. At the receiver end, multiple
copies of the transmitted signal are received that arrive at clearly distinguishable time instants and are faded by
signal cancellation. Rake receiver is a technique to combine these so called multi-paths [2] by utilizing multiple
correlation receivers allocated to those delay positions on which the significant energy arrives which achieves a
significant improvement in the SNR of the output signal. This paper shows how the rake, including dispreading
and descrambling could be replaced by a receiver that can be implemented on a CORDIC based hardware
architecture. The performance in conjunction with the computational requirements of the receiver is widely
adjustable which is significantly better than that of the conventional rake receiver
Sustaining Value Creation through Knowledge of Customer ExpectationsIOSR Journals
As the pursuit of knowledge becomes increasingly central to firms’ competitiveness, we argued that knowing what the customer expects of product offerings is a prerequisite for sustaining the delivery of value. Thus, this paper seeks to provide a theoretical contribution to the growing recognition of researches on customer as a source of firms’ competence. By building on extant literature of value creation, customer satisfaction/dissatisfaction, and the theories of firm knowledge creation, we proposed a framework of how firms can sustain value creation through knowledge of customer expectations. We argued that sustaining firms’ value creation resides in the ability of firms to continuously anticipate, integrate and configure knowledge of customer expectations to create product offerings that meet or exceed customer expectations and generate better economic returns than other competing alternative firms
Machine learning for text document classification-efficient classification ap...IAESIJAI
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text document categorization method that is both efficient and effective. In addition, methods for determining the proper relationship between a set of words in a document and its document categorization is also obtained.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
Text mining is a new and exciting research area that tries to solve the information overload problem by using techniques from machine learning, natural language processing (NLP), data mining, information retrieval (IR), and knowledge management. Text mining involves the pre-processing of document collections such as information extraction, term extraction, text categorization, and storage of intermediate representations. The techniques that are used to analyse these intermediate representations such as clustering, distribution analysis, association rules and visualisation of the results.
Building a recommendation system based on the job offers extracted from the w...IJECEIAES
Recruitment, or job search, is increasingly used throughout the world by a large population of users through various channels, such as websites, platforms, and professional networks. Given the large volume of information related to job descriptions and user profiles, it is complicated to appropriately match a user's profile with a job description, and vice versa. The job search approach has drawbacks since the job seeker needs to search a job offers in each recruitment platform, manage their accounts, and apply for the relevant job vacancies, which wastes considerable time and effort. The contribution of this research work is the construction of a recommendation system based on the job offers extracted from the web and on the e-portfolios of job seekers. After the extraction of the data, natural language processing is applied to structured data and is ready for filtering and analysis. The proposed system is a content-based system, it measures the degree of correspondence between the attributes of the e-portfolio with those of each job offer of the same list of competence specialties using the Euclidean distance, the result is classified with a decreasing way to display the most relevant to the least relevant job offers
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The enormous amount of information stored in unstructured texts cannot simply be used for further
processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific
(pre-) processing methods and algorithms are required in order to extract useful patterns. Text Mining is the
discovery of valuable, yet hidden, information from the text document. Text classification (Also called Text
Categorization) is one of the important research issues in the field of text mining. It is necessary to
classify/categorize large texts (documents) into specific classes. Text Classification assigns a text document to one of a
set of predefined classes. This paper covers different text classification techniques and also includes Classifier
Architecture and Text Classification Applications.
Automated hierarchical classification of scanned documents using convolutiona...IJECEIAES
This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from Pusat Data Teknologi dan Informasi (Technology and Information Data Center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Survey of Machine Learning Techniques in Textual Document Classification
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. III (Jan. 2014), PP 17-21
www.iosrjournals.org
www.iosrjournals.org 17 | Page
Survey of Machine Learning Techniques in Textual Document
Classification
S.W. Mohod1
, Dr. C.A.Dhote2
1
(Deptt. Computer Engineering, B.D. College of Engg. Sevagram, Wardha, India)
2
(Prof.,Ram Meghe Institute of Technology & Research, Badnera.Amravati, India)
Abstract: Classification of Text Document points towards associating one or more predefined categories based
on the likelihood expressed by the training set of labeled documents. Many machine learning algorithms plays
an important role in training the system with predefined categories. The importance of Machine learning
approach has felt because of which the study has been taken up for text document classification based on the
statistical event models available. The aim of this paper is to present the important techniques and
methodologies that are employed for text documents classification, at the same time making awareness of some
of the interesting challenges that remain to be solved, focused mainly on text representation and machine
learning techniques.
Keywords: Text mining, Web mining, Documents classification, Information retrieval, Event models.
I. Introduction
With the rapid growth of the World Wide Web and increasing availability of electronic documents, the
task of automatic categorization of documents became important for organizing the information and knowledge
discovery. Proper categorization of electronic documents, online news, blogs, e-mails and digital libraries
requires text mining, machine learning and natural language processing techniques to extract required
knowledge information. The term “Text document” refers to written, printed, or online document that presents
or communicates narrative or tabulated data in the form of an article, letter, memorandum, report, etc. The Text
expresses a vast range of information, but encodes the information in the form that is difficult to decipher
automatically. In the existing online word huge amount of textual information is available in textual form in
databases and various sources. The information may be available in structured and unstructured form.
Unstructured means data that does not reside in fixed locations. The term generally refers to free-form text,
which is present everywhere. Data that resides in fixed fields within a record or file that data is termed as a
structured data. Relational databases and spreadsheets are examples of structured data.
In reality a large portion of the available information does not appear in structured databases but rather
in collections of text articles drawn from various sources. Unstructured information refers to computerized
information that either does not have a data model or the one that is not easily usable by a computer program.
The term distinguishes such information from data stored in field form in databases or annotated in documents.
However, data mining deals with structured data, whereas text presents special characteristics and is
unstructured. The important task is how these documented data can be properly retrieved, presented and
classified. Extraction, Integration and classification of electronic documents from different sources and
knowledge information discovery from these documents are important.
In data mining, Machine learning is often used for Prediction or Classification. Classification involves
finding rule that partition the data into disjoint groups. The input for the classification is the training data set,
whose class labels are already known. Classifications analyze the training data set and construct a model based
on the class label. The goal of classification is to build a set of models that can correctly predict the class of the
different objects. Machine learning is an area of artificial intelligence concerned with the development of
techniques which allow computers to "learn". More specifically, machine learning is a method for creating
computer programs by the analysis of data sets since machine learning study the analysis of data.
Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the
data, while others adopt a collaborative approach between human and machine. Human intuition cannot be
entirely eliminated since the designer of the system must specify how the data are to be represented and what
mechanisms will be used to search for a characterization of the data. Machine learning has a wide spectrum of
applications including search engines, medical diagnosis, detecting credit card fraud, stock market analysis,
classifying DNA sequences, speech and handwriting recognition, game playing and robot locomotion.
II. Document Representation
One of the pre-processing techniques is the document representation which is used to reduce the
complexity of the documents. The documents need to be transformed from the full text version to a document
2. Survey of Machine Learning Techniques in Textual Document Classification
www.iosrjournals.org 18 | Page
vector. Text classification is again an important component in most information management tasks for which
algorithms that can maintain high accuracy are desired. Dimensionality reduction is a very important step in text
classification, because irrelevant and redundant features often degrade the performance of classification both in
speed and classification accuracy. Dimensionality reduction technique can be classified into feature extraction
(FE) [1] and feature selection (FS) approaches given below.
1 Feature Extraction
FE is the first step of pre-processing which is used to presents the text documents into clear word format.
So removing stop words and stemming words is the pre-processing tasks [2] [3]. The documents in text
classification are represented by a great amount of features and most of them could be irrelevant or noisy [4].
DR is the exclusion of a large number of keywords, base preferably on a statistical process, to create a low
dimension vector [5]. Commonly the steps taken for the feature extractions (Fig.1) are:
Tokenization: A document is treated as a string, and then partitioned into a list of tokens.
Removing stop words: Stop words such as “the”, “a”, “and”… etc are frequently occurring, so the insignificant
words need to be removed.
Stemming word: Applying the stemming algorithm that converts different word form into similar canonical
form. This step is the process of conflating tokens to their root form, e.g. connection to connect, computing to
compute etc.
Figure. 1 Document Classification Process
2 Feature Selection
After feature extraction the important step in preprocessing of text classification, is feature selection to
construct vector space, which improve the scalability, efficiency and accuracy of a text classifier. In general, a
good feature selection method should consider domain and algorithm characteristics [6]. The main idea of FS is
to select subset of features from the original documents. FS is performed by keeping the words with highest
score according to predetermined measure of the importance of the word [4]. The selected feature retains the
original physical meaning to provide a better understanding for the data and learning process [1]. For text
classification a major problem is the high dimensionality of the feature space. Almost every text domain has
much number of features, most of these features are not relevant and beneficial for text classification task, and
even some noise features may sharply reduce the classification accuracy [7]. Hence FS is commonly used in
text classification to reduce the dimensionality of feature space and improve the efficiency and accuracy of
classifiers.
III. MACHINE LEARNING ALGORITHMS
The documents can be classified by three ways, unsupervised, supervised and semi supervised methods.
Many techniques and algorithms are proposed recently for the clustering and classification of electronic
documents. This section focused on the supervised classification techniques, new developments and highlighted
some of the opportunities and challenges using the existing literature. The automatic classification of documents
into predefined categories has observed as an active attention, as the internet usage rate has quickly enlarged.
Tokenize Text
Stopwords
Stemming
Read Document
Vector Representation
Feature Selection
Learning algorithm
3. Survey of Machine Learning Techniques in Textual Document Classification
www.iosrjournals.org 19 | Page
From last few years, the task of automatic text classification have been extensively studied and rapid progress
seems in this area, including the machine learning approaches such as Bayesian classifier, Decision Tree, K-
nearest neighbor(KNN), Support Vector Machines(SVMs), Neural Networks, Latent Semantic Analysis,
Rocchio’s Algorithm, Fuzzy Correlation and Genetic Algorithms etc.
Normally supervised learning techniques are used for automatic text classification, where pre-defined category
labels are assigned to documents based on the likelihood suggested by a training set of labeled documents. Some
of these techniques are described below.
1 Rocchio’s Algorithm
Rocchio’s Algorithm [8] is a vector space method for document routing or filtering in informational
retrieval, build prototype vector for each class using a training set of documents, i.e. the average vector over all
training document vectors that belong to class ci, and calculate similarity between test document and each of
prototype vectors, which assign test document to the class with maximum similarity.
2 K-nearest neighbor (k-NN)
The k-nearest neighbor algorithm (k-NN) [9] is used to test the degree of similarity between documents
and k-training data and to store a certain amount of classification data, thereby determining the category of test
documents. This method is an instant-based learning algorithm that categorized objects based on closest feature
space in the training set [10]. The training sets are mapped into multi-dimensional feature space. The feature
space is partitioned into regions based on the category of the training set. A point in the feature space is assigned
to a particular category if it is the most frequent category among the k nearest training data. Usually Euclidean
Distance is typically used in computing the distance between the vectors. The key element of this method is the
availability of a similarity measure for identifying neighbors of a particular document [10]. The training phase
consists only of storing the feature vectors and categories of the training set. In the classification phase,
distances from the new vector, representing an input document, to all stored vectors are computed and k closest
samples are selected. The annotated category of a document is predicted based on the nearest point which has
been assigned to a particular category.
3 Decision Tree
The decision tree rebuilds the manual categorization of -training documents by constructing well-defined
true/false-queries in the form of a tree structure. In a decision tree structure, leaves represent the corresponding
category of documents and branches represent conjunctions of features that lead to those categories. The better
organized decision tree can easily classify a document by putting it in the root node of the tree and let it run
through the query structure until it reaches a certain leaf, which represents the goal for the classification of the
document.
4 Decision Rules Classification
This method uses the rule-based inference to classify documents to their annotated categories [11]. The
algorithms construct a rule set that describe the profile for each category. Rules are typically constructed in the
format of “IF condition THEN conclusion”, where the condition portion is filled by features of the category, and
the conclusion portion is represented with the category’s name or another rule to be tested. The rule set for a
particular category is then constructed by combining every separate rule from the same category with logical
operator, typically use “and” and “or”. During the classification tasks, not necessarily every rule in the rule set
needs to be satisfied. In the case of handling a data set with large number of features for each category,
heuristics implementation is recommended to reduce the size of rules set without affecting the performance of
the classification.
5 Naïve Bayes Algorithm
This classifier is a simple probabilistic classifier based on applying Bayes’ Theorem with strong
independence assumptions. A more descriptive term for the underlying probability model would be independent
feature model. These independence assumptions of features make the features order is irrelevant and
consequently that the present of one feature does not affect other features in classification tasks [12]. These
assumptions make the computation of Bayesian classification approach more efficient, but this assumption
severely limits its applicability. Depending on the precise nature of the probability model, the naïve Bayes
classifiers can be trained very efficiently by requiring a relatively small amount of training data to estimate the
parameters necessary for classification. Because independent variables are assumed, only the variances of the
variables for each class need to be determined and not the entire covariance matrix.
6 Artificial Neural Network
ANNs are constructed from a large number of elements with an input fan order of magnitudes larger than
in computational elements of traditional architectures [13] [14]. These elements, namely artificial neuron are
interconnected into group using a mathematical model for information processing based on a connectionist
approach to computation. The neural networks make their neuron sensitive to store item. It can be used for
distortion tolerant storing of a large number of cases represented by high dimensional vectors.
7 Fuzzy correlation
4. Survey of Machine Learning Techniques in Textual Document Classification
www.iosrjournals.org 20 | Page
Fuzzy correlation can deal with fuzzy information or incomplete data, and also convert the property value
into fuzzy sets for multiple document classification [15]. The researchers have shown great interest recently to
use the fuzzy rules and sets to improve the classification accuracy, by incorporating the fuzzy correlation or
fuzzy logic with the machine learning algorithm and the feature selection methods.
8 Genetic Algorithm
Genetic algorithm [16] aims to find optimum characteristic parameters using the mechanisms of genetic
evolution and survival of the fittest in natural selection. Genetic algorithms make it possible to remove
misleading judgments in the algorithms and improve the accuracy of document classification. This is an
adaptive probability global optimization algorithm, which simulated in a natural environment of biological and
genetic evolution, and is widely used for their simplicity and strength. Now several researchers used this method
for the improvement of the text classification process. In [17] authors introduced the genetic algorithm for text
categorization and used to build and optimize the user template, and also introduced simulated annealing to
improve the shortcomings of genetic algorithm. In the experimental analysis, they show that the improved
method is feasible and effective for text classification.
9 Support Vector Machine (SVM)
SVMs are one of the discriminative classification methods which are commonly recognized to be more
accurate. The SVM classification method is based on the Structural Risk Minimization principle from
computational learning theory [18]. The idea of this principle is to find a hypothesis to guarantee the lowest true
error. Besides, the SVM are well-founded that are very open to theoretical understanding and analysis [19]. The
SVM need both positive and negative training set which are uncommon for other classification methods. These
positive and negative training set are needed for the SVM to seek for the decision surface that best separates the
positive from the negative data in the n-dimensional space, so called the hyper plane. The document
representatives which are closest to the decision surface are called the support vector. The performance of the
SVM classification remains unchanged if documents that do not belong to the support vectors are removed from
the set of training data [12].
IV. Conclusion
This paper provides a review of machine learning approaches and documents representation techniques.
An analysis of feature selection methods and classification algorithms is also presented. It was observed from
the study that information Gain and square statistics are the most commonly used and well performed methods
for feature selection, however many other FS methods are recommended as single or hybrid technique. More
work is required for the performance improvement and accuracy of the documents classification process. New
methods and solutions are required for useful knowledge from the increasing volume of electronics documents.
REFERENCES
[1] Liu, H. and Motoda, ., “Feature Extraction, constraction and selection: A Data Mining Perpective.”, Boston, Massachusetts(MA):
Kluwer Academic Publishers.
[2] Wang, Y., and Wang X.J., “ A New Approach to feature selection in Text Classification”, Proceedings of 4th International
Conference on Machine Learning and Cybernetics, IEEE- 2005, Vol.6, pp. 3814-3819, 2005.
[3] Lee, L.W., and Chen, S.M., “New Methods for Text Categorization Based on a New Feature Selection Method a and New Similarity
Measure Between Documents”, IEA/AEI,France 2006.
[4] Montanes,E., Ferandez, J., Diaz, I., Combarro, E.F and Ranilla, J., “ Measures of Rule Quality for Feature Selection in Text
Categorization”, 5th international Symposium on Intelligent data analysis , Germeny-2003, Springer-Verlag 2003, Vol2810, pp.589-
598, 2003.
[5] Manomaisupat, P., and Abmad k., “ Feature Selection for text Categorization Using Self Orgnizing Map”, 2nd International
Conference on Neural Network and Brain, 2005,IEEE press Vol 3, pp.1875-1880, 2005.
[6] Zi-Qiang Wang, Xia Sun, De-Xian Zhang, Xin Li “An Optimal Svm-Based Text Classification Algorithm” Fifth International
Conference on Machine Learning and Cybernetics, Dalian,pp. 13-16 , 2006.
[7] Jingnian Chen a,b,, Houkuan Huang a, Shengfeng Tian a, Youli Qua Feature selection for text classification with Naïve Bayes”
Expert Systems with Applications 36, pp. 5432–5435, 2009.
[8] Rocchio, J; “Relevance Feedback in Information Retrieval”, In G. Salton (ed.). The SMART System: pp.67-88.
[9] Tam, V., Santoso, A., & Setiono, R. , “A comparative study of centroid-based, neighborhood-based and statistical approaches for
effective document categorization”, Proceedings of the 16th International Conference on Pattern Recognition, pp.235–238, 2002.
[10] Eui-Hong (Sam) Han, George Karypis, Vipin Kumar; “Text Categorization Using Weighted Adjusted k-Nearest Neighbor
Classification”, Department of Computer Science and Engineering. Army HPC Research Centre, University of Minnesota,
Minneapolis, USA. 1999.
[11] Chidanand Apte, Fred Damerau, Sholom M. Weiss.; “Towards Language independent Automated Learning of Text Categorization
Models”, In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information
Retrieval, pp. 23-30,1994.
[12] Heide Brücher, Gerhard Knolmayer, Marc-André Mittermayer; “Document Classification Methods for Organizing Explicit
Knowledge”, Research Group Information Engineering, Institute of Information Systems, University of Bern, Engehaldenstrasse 8,
CH - 3012 Bern, Switzerland, 2002.
[13] Miguel E. Ruiz, Padmini Srinivasan; “Automatic Text Categorization Using Neural Network”,In Proceedings of the 8th ASIS
SIG/CR Workshop on Classification Research, pp. 59-72. 1998.
5. Survey of Machine Learning Techniques in Textual Document Classification
www.iosrjournals.org 21 | Page
[14] Petri Myllymaki, Henry Tirri; “Bayesian Case-Based Reasoning with Neural Network”, In Proceeding of the IEEE International
Conference on Neural Network’93, Vol. 1, pp. 422-427. 1993.
[15] Que, H. -E. “Applications of fuzzy correlation on multiple document classification. Unpublished master thesis”, Information
Engineering Department, Tamkang University, Taipei, Taiwan-2000.
[16] Wang Xiaoping, Li-Ming Cao. Genetic Algorithm Theory, Application and Software[M]. XI'AN:Xi'an Jiaotong University Press,
2002.
[17] ZHU Zhen-fang, LIU Pei-yu, Lu Ran, “Research of text classification technology based on genetic annealing algorithm” IEEE,, 978-
0-7695-3311-7/08, 2008.
[18] Vladimir N. Vapnik, “The Nature of Statistical Learning heory”, Springer, NewYork. 1995.
[19] Thorsten Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features” ECML-98, 10th
European Conference on Machine Learning, pp. 137-142. 1998.