2. HOW WEKA DEVELOPED FROM BIRD
Weka. The weka (also known as Maori hen or woodhen) (Gallirallus australis)
is a flightless bird species of the rail family.
The beans placed in front of kiwi and when the kiwi get beans then the people
made it name of mathematical and processing of numberings.
Then it was developed as a name of weka.
3. WHAT IS WEKA ?
Weka is a machine learning software and a data
mining tool written in java, developed university of
Waikato new Zealand it is free software licensed
under the Gnu (General Public License).
It provides the facility to classify the data through various algorithms.
Keywords: Data mining, data preprocessing, classification,
cluster analysis, Weka tool etc.
Extension: weather.arff
Attribute relation file format.
4. WEKA PROPERTIES...
In 1993 it developed in university of Waikato New Zealand.
In 1997 it re-developed from scratch in java.
In 2005 data mining and knowledge discovery award SIGKDD
received.
Weka contain tools for data preprocessing. IT IS GRAPHICAL
USER INTERFACE.
6. ROLE OF WEKA...
INPUT
Raw Data
OUTPUT
Result
DATA MINING BY WEKA
1. Preprocessing
2. Classifying
3. Clustering
4. Select Attribute
5. Association
6. Visualization
7. PREDICTION PROBLEMS: CLASSIFICATION
Classification
predicts categorical class labels
(discrete or nominal)
classifies data (constructs a
model) based on the training set
and the values (class labels) in a
classifying attribute and uses it in
classifying new data 7
8. ROLE OF PREPROCESSOR DATA
Measures for data quality: A multidimensional view
Accuracy: correct or wrong, accurate or not
Completeness: not recorded, unavailable, …
Consistency: some modified but some not, dangling, …
Timeliness: timely update?
Believability: how trustable the data are correct?
Interpretability: how easily the data can be understood?
8
9. START WEKA...
From Windows Desktop
I. Click “Start”, choose “All Programs”
II. Choose weka “3.8” to start weka.
Then the first interface window appears:
Weka Gui Chooser
17. TREE...
TREES IN WEKA ARE USED FOR
DECISION SUCH AS IF WEATHER
GRATER THAN 30 SO GO OUTSIDE
OTHERWISE DONT GO OUTSIDE.
18. Decision Tree Induction: An Example
18
age?
overcast
student? credit rating?
<=30 >40
no yes yes
yes
31..40
fair
excellent
yes
no
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Training data set: Buys_computer
The data set follows an example of
Quinlan’s ID3 (Playing Tennis)
Resulting tree:
19. How to Handle Noisy Data?
Binning
first sort data and portition into (equal-frequency) bins
then one can smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
Regression
smooth by fitting the data into regression functions
Clustering
detect and remove outliers
Combined computer and human inspection
detect suspicious values and check by human (e.g., deal
with possible outliers)
19
33. CONCLUSIONS
The overall goal of weka is to build
a State-of-The-Art facility for
developing Machine Learning (ML)
techniques and allow people to
apply tem to real world data mining
problems.