Text classification with Weka

Text Classification With Weka
Waikato Environment of Knowledge Discovery

Text Classification With Weka
Weka is a collection of machine learning algorithms for data mining tasks.
These algorithms can be either applied directly on data sets or can be called
from your java/python code.
With Weka you can do classification, clustering, regression and maybe other
ml and data mining tasks.
Weka was developed at the university of Waikato, New Zealand in 1993 as a
free software available under the GNU General Public license.

Background
Machine Learning :
- Is type of Artificial Intelligence (AI) that provides computers with the ability to
learn without being explicitly programmed.
- A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P . If its performance P at some tasks int T
improves with experience E.
- E : is the data we learn from .
- T : is the task we are trying to achieve.
- P : the measurement of how accurate the algorithm is.

Background
Machine Learning has tow types:
- Supervised Learning where the computer is provided with set of instances each
instance has set of features and label and the task is to learn general rule to map
these features to labels. Example of this type is ( text classification )
- Unsupervised Learning No labels are given to the computer. and the task is to
figure patterns and relations between the instances based on the features it
contains.

Text Classification
The task of assigning predefined categories for free text documents. It can
provide conceptual views of documents collections and has important
application in the real world.
How do we do text classification :
1- Data preparation.
2- Text processing ( tokenization => removing stop words => stemming => attribute
selection)
3- train the model using the extracted text features
4- evaluating the model.

OpenSooq Text Classification
Task Definition :
Building a software to predict post category from its description. The software will be an
assistant to help the users to find the appropriate category for their selling.
How to build this software ?
We will use Text Classification with Weka to build ml model that learns from the posts
that opensooq already has.

- Loading posts from database : Using mysql-connector provided by weka we
will load 500 posts from three categories. Each post (instance) contains description
and category

- Text processing : Using StringToWordVector filter provided by Weka we will
transform description into word vector.

- Learning Model : We will build two models using the training data we have and
compare their performance. These two models are : ( J48 and Naive Bayes ).

- Model Evaluation : Measuring the accuracy of the model. It can be
done by applying the model on a training data that already linked with
labels and compare how many instance will be guessed right.
- In Weka you can do model evaluation by :
- Feeding the model with set of testing data
- Splitting data into training/testing sets.
- cross validation.

- Using the model to predict posts categories:
- We will be using PyWeka to load the model in
python and use it to predict the categories of the
post.

Text classification with Weka

More Related Content

What's hot

Viewers also liked

Similar to Text classification with Weka

Recently uploaded

Text classification with Weka