• Save
Data mining: Classification and prediction
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data mining: Classification and prediction

on

  • 28,075 views

Data mining: Classification and prediction

Data mining: Classification and prediction

Statistics

Views

Total Views
28,075
Views on SlideShare
27,948
Embed Views
127

Actions

Likes
7
Downloads
0
Comments
2

2 Embeds 127

http://www.dataminingtools.net 74
http://dataminingtools.net 53

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data mining: Classification and prediction Presentation Transcript

  • 1. Mining: Classification and Prediction
  • 2. Classification and Prediction
    The data analysis task is classification, where a model or classifier is constructed to predict categorical labels.
     Data analysis task is an example of numeric prediction, where the model constructed predicts a continuous-valued function, or ordered value, as opposed to a categorical label.
    This model is a predictor.
  • 3. Steps and issues in preparing the Data for Classification and Prediction Data cleaning:
    Relevance analysis
    Data transformation and reduction
    Comparing Classification and Prediction Methods
    Accuracy
    speed
    Robustness
    scalability
    Interpretability
  • 4. Classification by Decision Tree Induction
    Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label.
  • 5. Tree Pruning
    When a decision tree is built, many of the branches will reflect anomalies in the training data due to noise or outliers.
    Tree pruning methods address this problem of over-fitting the data.
    Scalability and Decision Tree Inductionproblem: Most often, the training data will not fit in memory!
    Decision tree construction therefore becomes inefficient due to swapping of the training tuples inand out of main and cache memories., that’s why it is necessary to have scalable decision tree.
  • 6. Bayesian Classification
    Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class.
  • 7. Bayesian belief network
    A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a Directed Acyclic graph (DAG).
  • 8. Training Bayesian Belief Networks
    In the learning or training of a belief network, a number of scenarios are possible.
    The network topology (or “layout” of nodes and arcs) may be given in advance or inferred from the data.
    The network variables may be observable or hidden in all or some of the training tuples. The case of hidden data is also referred to as missing values or incomplete data.
  • 9. Back propagation
    Back propagation is a neural network learning algorithm. The field of neural networks was originally kindled by psychologists and neurobiologists who sought to develop and test computational analogues of neurons.
    Back propagation learns by iteratively processing a data set of training tuples, comparing the network’s prediction for each tuple with the actual known target value
  • 10. Classification by Association Rule Analysis
    Frequent patterns and their corresponding association or correlation rules characterize interesting relationships between attribute conditions and class labels, and thus have been recently used for effective classification.
    Association rules show strong associations between attribute-value pairs (or items) that occur frequently in a given data set.
    Association rules are commonly used to analyze the purchasing patterns of customers in a store.
  • 11. Training tuples
    Eager learners: when given a set of training tuples, it will construct a generalization (i.e., classification) model before receiving new (e.g., test) tuples to classify.
    Lazy approach, in which the learner instead waits until the last minute before doing any model construction in order to classify a given test tuple. That is, when given a training tuple, a lazy learner simply stores it (or does only a little minor processing) and waits until it is given a test tuple.
  • 12. Other classification methods
    Genetic Algorithms Genetic algorithms attempt to incorporate ideas of natural evolution.
    Rough Set Approach Rough set theory can be used for classification to discover structuralrelationships within imprecise or noisy data.
    Fuzzy Set Approaches Rule-based systems for classification have the disadvantage that they involve sharp cutoffs for continuous attributes.
  • 13. Prediction in Data mining
    Linear Regression Straight-line regression analysis involves a response variable, y, and asingle predictor variable, x. It is the simplest form of regression, and models y as a linearfunction of x.
    Nonlinear RegressionTransformation of a polynomial regression model to a linear regression model, and then predict the values.
  • 14. Ensemble Methods for Increasing the Accuracy in prediction
    Bagging and Boosting
    The bagging algorithm create an ensemble of models (classifiers or predictors) for a learning scheme where each model gives an equally-weighted prediction.
    In boosting, weights are assigned to each training tuple. A series of k classifiers is iteratively learned. After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1 , to “pay more attention” to the training tuples that were misclassified by Mi .
  • 15. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at www.dataminingtools.net