• Save
RapidMiner: Data Mining And Rapid Miner
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

RapidMiner: Data Mining And Rapid Miner

on

  • 6,572 views

Rapid Miner: Data Mining And Rapid Miner

Rapid Miner: Data Mining And Rapid Miner

Statistics

Views

Total Views
6,572
Views on SlideShare
6,528
Embed Views
44

Actions

Likes
6
Downloads
0
Comments
1

5 Embeds 44

http://www.slideshare.net 32
http://www.dataminingtools.net 8
http://dataminingtools.net 2
http://webcache.googleusercontent.com 1
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

RapidMiner: Data Mining And Rapid Miner Presentation Transcript

  • 1. RapidMiner5
    2.7 - Data Mining and RapidMiner
  • 2. Machine Learning
    Machine learning algorithms: RapidMiner offers a huge number of learning schemes for:
    support vector machines (SVM),
    decision tree
    rule learners
    lazy learners,
    Bayesian learners
    Logistic learners.
    association rule mining and clustering
    meta learning schemes including Bayesian Boosting.
  • 3. Machine Learning
    Decision Trees: This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. This decision tree learner works similar to Quinlan‘s C4.5 or CART. The actual type of the tree is determined by the criterion, e.g. using gain ratio or Gini for CART / C4.5.
  • 4. Machine Learning
    Neural Net: T his operator learns a model by means of a feed-forward neural network. The learning is done via back-propagation. The user can define the structure of the neural network with the parameter list “hidden layer types“. Each list entry describes a new hidden layer. The key of each entry must correspond to the layer type which must be one out of
    ˆ linear
    ˆ sigmoid (default)
    ˆ tanh
    ˆ sine
    ˆ logarithmic
    ˆ gaussian
  • 5. Machine Learning
    Bayesian Boosting: This operator trains an ensemble of classifiers for boolean target attributes. In each iteration the training set is reweighted, so that previously discovered patterns and other kinds of prior knowledge are sampled out. An inner classifier, typically a rule or decision tree induction algorithm, is sequentially applied several times, and the models are combined to a single global model. The number of models to be trained maximally are specified by the parameter iterations.
  • 6. Meta Learning
    Bootstrap aggregating (bagging): is a machine learning ensemble meta-algorithm to improve machine learning of classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid over fitting. Although it is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.
  • 7. Preprocessing
    Feature Selection: Assume that we have a dataset with numerous attributes. We would like to test, whether all of these attributes are really relevant, or whether we can get a better model by omitting some of the original attributes. This task is called feature selection and the backward elimination algorithm is an approach that can solve it.
  • 8. Preprocessing
    Backward Elimination in RapidMiner5:
    Enclose the cross-validation chain by a FeatureSelection operator.
    This operator repeatedly applies the cross-validation chain, which now is its inner operator, until the specified stopping criterion is complied with. The backward elimination approach iteratively removes the attribute whose removal yields the largest performance improvement.
  • 9. Preprocessing
    UserBasedDiscretization:This operator discretizes a numerical attribute to either a nominal or an ordinal attribute. The numerical values are mapped to the classes according to the thresholds specified by the user. The user can define the classes by specifying the upper limits of each class.
  • 10. Preprocessing
    Normalization: This operator performs a normalization. This can be done between a user defined minimum and maximum value or by a z-transformation, i.e. on mean 0 and variance 1. or by a proportional transformation as proportion of the total sum of the respective attribute.
  • 11. Preprocessing
    Sampling: This operator performs a random sampling of a given fraction. For example, if the input example set contains 5000 examples and the sample ratio is set to 0.1, the result will have approximately 500 examples.
  • 12. Genetic Algorithm
    Genetic Algorithm: A genetic algorithm for feature selection (mutation=switch features on and off, crossover=interchange used features). Selection is done by roulette wheel. Genetic algorithms are general purpose optimization / search algorithms that are suitable in case of no or little problem knowledge.
  • 13. Validation
    A Simple Validation randomly splits up the example set into a training and test set and evaluates the model.
  • 14. More Questions?
    Reach us at support@dataminingtools.net
    Visit: www.dataminingtools.net