1.
RapidMiner5<br />2.7 - Data Mining and RapidMiner<br />
2.
Machine Learning<br />Machine learning algorithms: RapidMiner offers a huge number of learning schemes for:<br />support vector machines (SVM),<br /> decision tree<br />rule learners<br />lazy learners,<br />Bayesian learners<br />Logistic learners. <br />association rule mining and clustering <br />meta learning schemes including Bayesian Boosting.<br />
3.
Machine Learning<br />Decision Trees: This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. This decision tree learner works similar to Quinlan‘s C4.5 or CART. The actual type of the tree is determined by the criterion, e.g. using gain ratio or Gini for CART / C4.5.<br />
4.
Machine Learning<br />Neural Net: T his operator learns a model by means of a feed-forward neural network. The learning is done via back-propagation. The user can define the structure of the neural network with the parameter list “hidden layer types“. Each list entry describes a new hidden layer. The key of each entry must correspond to the layer type which must be one out of<br /> linear<br /> sigmoid (default)<br /> tanh<br /> sine<br /> logarithmic<br /> gaussian<br />
5.
Machine Learning<br />Bayesian Boosting: This operator trains an ensemble of classifiers for boolean target attributes. In each iteration the training set is reweighted, so that previously discovered patterns and other kinds of prior knowledge are sampled out. An inner classifier, typically a rule or decision tree induction algorithm, is sequentially applied several times, and the models are combined to a single global model. The number of models to be trained maximally are specified by the parameter iterations.<br />
6.
Meta Learning<br />Bootstrap aggregating (bagging): is a machine learning ensemble meta-algorithm to improve machine learning of classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid over fitting. Although it is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.<br />
7.
Preprocessing<br />Feature Selection: Assume that we have a dataset with numerous attributes. We would like to test, whether all of these attributes are really relevant, or whether we can get a better model by omitting some of the original attributes. This task is called feature selection and the backward elimination algorithm is an approach that can solve it.<br />
8.
Preprocessing<br />Backward Elimination in RapidMiner5:<br /> Enclose the cross-validation chain by a FeatureSelection operator.<br /> This operator repeatedly applies the cross-validation chain, which now is its inner operator, until the specified stopping criterion is complied with. The backward elimination approach iteratively removes the attribute whose removal yields the largest performance improvement.<br />
9.
Preprocessing<br />UserBasedDiscretization:This operator discretizes a numerical attribute to either a nominal or an ordinal attribute. The numerical values are mapped to the classes according to the thresholds specified by the user. The user can define the classes by specifying the upper limits of each class.<br />
10.
Preprocessing<br />Normalization: This operator performs a normalization. This can be done between a user defined minimum and maximum value or by a z-transformation, i.e. on mean 0 and variance 1. or by a proportional transformation as proportion of the total sum of the respective attribute.<br />
11.
Preprocessing<br />Sampling: This operator performs a random sampling of a given fraction. For example, if the input example set contains 5000 examples and the sample ratio is set to 0.1, the result will have approximately 500 examples.<br />
12.
Genetic Algorithm<br />Genetic Algorithm: A genetic algorithm for feature selection (mutation=switch features on and off, crossover=interchange used features). Selection is done by roulette wheel. Genetic algorithms are general purpose optimization / search algorithms that are suitable in case of no or little problem knowledge.<br />
13.
Validation<br />A Simple Validation randomly splits up the example set into a training and test set and evaluates the model.<br />
14.
More Questions?<br />Reach us at support@dataminingtools.net<br />Visit: www.dataminingtools.net<br />
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.