Efficient Text Categorization Using a Massively Parallel ...

•Download as DOC, PDF•

0 likes•73 views

This document discusses a research talk on efficiently categorizing text documents using a massively parallel machine learning model. The model was used to classify over two million Japanese newspaper articles into 75 categories using advanced computing resources. Experiments on an IBM server showed that the proposed parallel model effectively categorized large text corpora, comparing different feature extraction methods and classifiers.

Efficient Text Categorization Using a Massively Parallel Machine Learning Model

LU Bao-Liang
Department of Computer Science and Engineering
Shanghai Jiao Tong University
blu@cs.sjtu.edu.cn

Abstract: In this talk, we present our recent research progress on categorizing Japanese text documents
based on a massively parallel machine learning model. By using this model, we can deal with the
problem of classifying large-scale document sets with advanced computing resources such as cluster
system and grid computing system. We perform experiments on an IBM P690 sever computer system
to classify Yomiuri Newspaper Corpus, which includes over two millions text documents and 75
different categories. We compare various feature extraction methods and several popular pattern
classifiers such as k-NN and support vector machines on Yomiuri Newspaper Corpus. The simulation
results show the effectiveness of the proposed massively parallel machine learning model for text
categorization.

Chennai Office: JP INFOTECH, Old No.31, New No.86, 1st Floor, 1st Avenue, Ashok Nagar, Chennai-83. Landmark: Next to Kotak Mahendra Bank/Bharath Scans Landline: (044) - 43012642 / Mobile: (0)9952649690 Pondicherry Office: JP INFOTECH, #45, Kamaraj Salai, Thattanchavady, Puducherry - 9 Landmark: Opp. to Thattanchavady Industrial Estate & Next to VVP Nagar Arch. Landline: (0413) - 4300535 / Mobile: (0)8608600246 / (0)9952649690 .

Web Information Extraction for the DB Research Domain

liat_kakun

View the Microsoft Word document.docbutest

Text classification

Sai Srinivas Kotni

Barzilay & Lapata 2008 presentation

Richard Littauer

Clone group mapping has a very important significance in the evolution of code clone. The topic modeling techniques were applied into code clone firstly and a new clone group mapping method was proposed. By using topic modeling techniques to transform the mapping problem of high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was indirectly reached by mapping clone group topics. Experiments on four open source software show that the recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone group mapping.

Text classification supervised algorithms with term frequency inverse documen...

IJECEIAES

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product review analysis, spam filtering, and text sentiment mining. This study analyzes the generic categorization strategy and examines supervised machine learning approaches and their ability to comprehend complex models and nonlinear data interactions. Among these methods are k-nearest neighbors (KNN), support vector machine (SVM), and ensemble learning algorithms employing various evaluation techniques. Thereafter, an evaluation is conducted on the constraints of every technique and how they can be applied to real-life situations.

Comparative analysis of c99 and topictiling text segmentation algorithms

eSAT Journals

Abstract In this paper, the work done includes the extraction of information from image datasets which contain natural text. The difficulty level of segmenting natural text from an image is too high and so precision is the most important factor to be kept in mind. To minimize the error rates, error filtration technique is provided, as filtration is adopted while doing image segmentation basically text segmentation present in images. Furthermore, a comparative analysis of two different text segmentation algorithms namely C99 and TopicTiling on image documents is presented. To assess how well each algorithm works, each was applied on different datasets and results were compared. The work done also proves the efficiency of TopicTiling over C99. Index Terms: Text Segmentation, text extraction, image documents,C99 and TopicTiling.

Comparative analysis of c99 and topictiling text

eSAT Publishing House

IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.

Text Mining: (Asynchronous Sequences)

IJERA Editor

In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.

Web Information Extraction for the Database Research Domain

Michael Genkin

Quantum transfer learning for image classification

TELKOMNIKA JOURNAL

Quantum machine learning, an important element of quantum computing, recently has gained research attention around the world. In this paper, we have proposed a quantum machine learning model to classify images using a quantum classifier. We exhibit the results of a comprehensive quantum classifier with transfer learning applied to image datasets in particular. The work uses hybrid transfer learning technique along with the classical pre-trained network and variational quantum circuits as their final layers on a small scale of dataset. The implementation is carried out in a quantum processor of a chosen set of highly informative functions using PennyLane a cross-platform software package for using quantum computers to evaluate the high-resolution image classifier. The performance of the model proved to be more accurate than its counterpart and outperforms all other existing classical models in terms of time and competence.

TOWARD OPTIMAL FEATURE SELECTION IN NAÏVE BAYES FOR TEXT CATEGORIZATION

Nexgen Technology

TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.

A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia

Killian Levacher

Using queuing theory to describe adaptive mathematical models of computing sy...

journalBEEI

The article describes the issues of preparation and verification of mathematical models of computing systems with resource virtualization. The object of this study is to verify of mathematical models of computer systems with virtualization experimentally by creating a virtual server on the host platform and monitoring its characteristics under load. Known models cannot be applied to the aircraft with virtualization, because they do not allow a comprehensive analysis to determine the most effective option for the implementation of the initial allocation of resources and its optimization for a specific sphere and task of use. The article for the study used a closed queueing network. Simple models for the analysis of various structures of computer systems are experimentally obtained. To implement the properties of adaptability in the models, triggers are used that monitor and adjust the power of the processing channel in individual Queuing systems, depending on the specified conditions. Experiments prove the obtained results reliable and usable as a flexible tool for studying the virtualization properties when structuring computing systems. This knowledge could be of use for businesses interested in optimizing the server configuration for their IT infrastructure.

Advanced computer architecture unit 5

Kunal Bangar

Learning from similarity and information extraction from structured documents...

Infrrd

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...Hiroshi Ono

Automatic Annotation Of Incomplete And Scattered Bibliographical References I...

Katie Naple

Automated Essay Scoring Using Generalized Latent Semantic Analysis

Gina Rizzo

F0372032035

inventionjournals

International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.

8 efficient multi-document summary generation using neural network

INFOGAIN PUBLICATION

From last few years online information is growing tremendously on World Wide Web or on user’s desktops and thus online information gains much more attention in the field of automatic text summarization. Text mining has become a significant research field as it produces valuable data from unstructured and large amount of texts. Summarization systems provide the possibility of searching the important keywords of the texts and so the consumer will expend less time on reading the whole document. Main objective of summarization system is to generate a new form which expresses the key meaning of the contained text. This paper study on various existing techniques with needs of novel Multi-Document summarization schemes. This paper is motivated by arising need to provide high quality summary in very short period of time. In proposed system, user can quickly and easily access correctly-developed summaries which expresses the key meaning of the contained text. The primary focus of this paper lies with thef_β-optimal merge function, a function recently presented here, that uses the weighted harmonic mean to discover a harmony in the middle of precision and recall. Proposed system utilizes Bisect K-means clustering to improve the time and Neural Networks to improve the accuracy of summary generated by NEWSUM algorithm.

An efficient-classification-model-for-unstructured-text-document

SaleihGero

NS-CUK Seminar: V.T.Hoang, Review on "Role Equivalence Attention for Label Pr...

ssuser4b1f48

Optimizer algorithms and convolutional neural networks for text classification

IAESIJAI

Lately, deep learning has improved the algorithms and the architectures of several natural language processing (NLP) tasks. In spite of that, the performance of any deep learning model is widely impacted by the used optimizer algorithm; which allows updating the model parameters, finding the optimal weights, and minimizing the value of the loss function. Thus, this paper proposes a new convolutional neural network (CNN) architecture for text classification (TC) and sentiment analysis and uses it with various optimizer algorithms in the literature. Actually, in NLP, and particularly for sentiment classification concerns, the need for more empirical experiments increases the probability of selecting the pertinent optimizer. Hence, we have evaluated various optimizers on three types of text review datasets: small, medium, and large. Thereby, we examined the optimizers regarding the data amount and we have implemented our CNN model on three different sentiment analysis datasets so as to binary label text reviews. The experimental results illustrate that the adaptive optimization algorithms Adam and root mean square propagation (RMSprop) have surpassed the other optimizers. Moreover, our best CNN model which employed the RMSprop optimizer has achieved 90.48% accuracy and surpassed the state-of-the-art CNN models for binary sentiment classification problems.

EL MODELO DE NEGOCIO DE YOUTUBEbutest

1. MPEG I.B.P frame之不同butest

Similar to Efficient Text Categorization Using a Massively Parallel ...

A novel approach based on topic

csandit

Text classification supervised algorithms with term frequency inverse documen...

IJECEIAES

Comparative analysis of c99 and topictiling text segmentation algorithms

eSAT Journals

Comparative analysis of c99 and topictiling text

eSAT Publishing House

Text Mining: (Asynchronous Sequences)

IJERA Editor

Web Information Extraction for the Database Research Domain

Michael Genkin

Quantum transfer learning for image classification

TELKOMNIKA JOURNAL

TOWARD OPTIMAL FEATURE SELECTION IN NAÏVE BAYES FOR TEXT CATEGORIZATION

Nexgen Technology

A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia

Killian Levacher

Using queuing theory to describe adaptive mathematical models of computing sy...

journalBEEI

Advanced computer architecture unit 5

Kunal Bangar

Learning from similarity and information extraction from structured documents...

Infrrd

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...Hiroshi Ono

Automatic Annotation Of Incomplete And Scattered Bibliographical References I...

Katie Naple

Automated Essay Scoring Using Generalized Latent Semantic Analysis

Gina Rizzo

F0372032035

inventionjournals

8 efficient multi-document summary generation using neural network

INFOGAIN PUBLICATION

An efficient-classification-model-for-unstructured-text-document

SaleihGero

NS-CUK Seminar: V.T.Hoang, Review on "Role Equivalence Attention for Label Pr...

ssuser4b1f48

Optimizer algorithms and convolutional neural networks for text classification

IAESIJAI

Similar to Efficient Text Categorization Using a Massively Parallel ... (20)

A novel approach based on topic

Text classification supervised algorithms with term frequency inverse documen...

Comparative analysis of c99 and topictiling text segmentation algorithms

Comparative analysis of c99 and topictiling text

Text Mining: (Asynchronous Sequences)

Web Information Extraction for the Database Research Domain

Quantum transfer learning for image classification

TOWARD OPTIMAL FEATURE SELECTION IN NAÏVE BAYES FOR TEXT CATEGORIZATION

A Framework for Content Preparation to Support Open-Corpus Adaptive Hypermedia

Using queuing theory to describe adaptive mathematical models of computing sy...

Advanced computer architecture unit 5

Learning from similarity and information extraction from structured documents...

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categor...

Automatic Annotation Of Incomplete And Scattered Bibliographical References I...

Automated Essay Scoring Using Generalized Latent Semantic Analysis

F0372032035

8 efficient multi-document summary generation using neural network

An efficient-classification-model-for-unstructured-text-document

NS-CUK Seminar: V.T.Hoang, Review on "Role Equivalence Attention for Label Pr...

Optimizer algorithms and convolutional neural networks for text classification

More from butest

EL MODELO DE NEGOCIO DE YOUTUBEbutest

1. MPEG I.B.P frame之不同butest

LESSONS FROM THE MICHAEL JACKSON TRIALbutest

Timeline: The Life of Michael Jacksonbutest

Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest

LESSONS FROM THE MICHAEL JACKSON TRIALbutest

Com 380, Summer IIbutest

PPTbutest

The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest

MICHAEL JACKSON.docbutest

Social Networks: Twitter Facebook SL - Slide 1butest

Facebook butest

Executive Summary Hare Chevrolet is a General Motors dealership ...butest

Welcome to the Dougherty County Public Library's Facebook and ...butest

NEWS ANNOUNCEMENTbutest

C-2100 Ultra Zoom.docbutest

MAC Printing on ITS Printers.doc.docbutest

Mac OS X Guide.docbutest

hierbutest

WEB DESIGN!butest

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE

1. MPEG I.B.P frame之不同

LESSONS FROM THE MICHAEL JACKSON TRIAL

Timeline: The Life of Michael Jackson

Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...

LESSONS FROM THE MICHAEL JACKSON TRIAL

Com 380, Summer II

PPT

The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz

MICHAEL JACKSON.doc

Social Networks: Twitter Facebook SL - Slide 1

Facebook

Executive Summary Hare Chevrolet is a General Motors dealership ...

Welcome to the Dougherty County Public Library's Facebook and ...

NEWS ANNOUNCEMENT

C-2100 Ultra Zoom.doc

MAC Printing on ITS Printers.doc.doc

Mac OS X Guide.doc

hier

WEB DESIGN!

Efficient Text Categorization Using a Massively Parallel ...

1. Efficient Text Categorization Using a Massively Parallel Machine Learning Model LU Bao-Liang Department of Computer Science and Engineering Shanghai Jiao Tong University blu@cs.sjtu.edu.cn Abstract: In this talk, we present our recent research progress on categorizing Japanese text documents based on a massively parallel machine learning model. By using this model, we can deal with the problem of classifying large-scale document sets with advanced computing resources such as cluster system and grid computing system. We perform experiments on an IBM P690 sever computer system to classify Yomiuri Newspaper Corpus, which includes over two millions text documents and 75 different categories. We compare various feature extraction methods and several popular pattern classifiers such as k-NN and support vector machines on Yomiuri Newspaper Corpus. The simulation results show the effectiveness of the proposed massively parallel machine learning model for text categorization.

Efficient Text Categorization Using a Massively Parallel ...

Recommended

Recommended

More Related Content

Similar to Efficient Text Categorization Using a Massively Parallel ...

Similar to Efficient Text Categorization Using a Massively Parallel ... (20)

More from butest

More from butest (20)

Efficient Text Categorization Using a Massively Parallel ...