Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- RAPIDMINER: Introduction To Datamining by DataminingTools Inc 30727 views
- RM World 2014: BI assisted predicti... by RapidMiner 1164 views
- RapidMiner: Extensions To Rapid Miner by Rapidmining Content 6868 views
- Rapidminer: Visualization Capabil... by Rapidmining Content 1220 views
- Rapid miner by Manish Champaneri 196 views
- Python Live Training, Python Online... by ITeLearn 171 views

9,841 views

Published on

RapidMiner: Data Mining And Rapid Miner

Published in:
Technology

No Downloads

Total views

9,841

On SlideShare

0

From Embeds

0

Number of Embeds

206

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- 1. RapidMiner5<br />2.7 - Data Mining and RapidMiner<br />
- 2. Machine Learning<br />Machine learning algorithms: RapidMiner offers a huge number of learning schemes for:<br />support vector machines (SVM),<br /> decision tree<br />rule learners<br />lazy learners,<br />Bayesian learners<br />Logistic learners. <br />association rule mining and clustering <br />meta learning schemes including Bayesian Boosting.<br />
- 3. Machine Learning<br />Decision Trees: This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. This decision tree learner works similar to Quinlan‘s C4.5 or CART. The actual type of the tree is determined by the criterion, e.g. using gain ratio or Gini for CART / C4.5.<br />
- 4. Machine Learning<br />Neural Net: T his operator learns a model by means of a feed-forward neural network. The learning is done via back-propagation. The user can define the structure of the neural network with the parameter list “hidden layer types“. Each list entry describes a new hidden layer. The key of each entry must correspond to the layer type which must be one out of<br /> linear<br /> sigmoid (default)<br /> tanh<br /> sine<br /> logarithmic<br /> gaussian<br />
- 5. Machine Learning<br />Bayesian Boosting: This operator trains an ensemble of classifiers for boolean target attributes. In each iteration the training set is reweighted, so that previously discovered patterns and other kinds of prior knowledge are sampled out. An inner classifier, typically a rule or decision tree induction algorithm, is sequentially applied several times, and the models are combined to a single global model. The number of models to be trained maximally are specified by the parameter iterations.<br />
- 6. Meta Learning<br />Bootstrap aggregating (bagging): is a machine learning ensemble meta-algorithm to improve machine learning of classification and regression models in terms of stability and classification accuracy. It also reduces variance and helps to avoid over fitting. Although it is usually applied to decision tree models, it can be used with any type of model. Bagging is a special case of the model averaging approach.<br />
- 7. Preprocessing<br />Feature Selection: Assume that we have a dataset with numerous attributes. We would like to test, whether all of these attributes are really relevant, or whether we can get a better model by omitting some of the original attributes. This task is called feature selection and the backward elimination algorithm is an approach that can solve it.<br />
- 8. Preprocessing<br />Backward Elimination in RapidMiner5:<br /> Enclose the cross-validation chain by a FeatureSelection operator.<br /> This operator repeatedly applies the cross-validation chain, which now is its inner operator, until the specified stopping criterion is complied with. The backward elimination approach iteratively removes the attribute whose removal yields the largest performance improvement.<br />
- 9. Preprocessing<br />UserBasedDiscretization:This operator discretizes a numerical attribute to either a nominal or an ordinal attribute. The numerical values are mapped to the classes according to the thresholds specified by the user. The user can define the classes by specifying the upper limits of each class.<br />
- 10. Preprocessing<br />Normalization: This operator performs a normalization. This can be done between a user defined minimum and maximum value or by a z-transformation, i.e. on mean 0 and variance 1. or by a proportional transformation as proportion of the total sum of the respective attribute.<br />
- 11. Preprocessing<br />Sampling: This operator performs a random sampling of a given fraction. For example, if the input example set contains 5000 examples and the sample ratio is set to 0.1, the result will have approximately 500 examples.<br />
- 12. Genetic Algorithm<br />Genetic Algorithm: A genetic algorithm for feature selection (mutation=switch features on and off, crossover=interchange used features). Selection is done by roulette wheel. Genetic algorithms are general purpose optimization / search algorithms that are suitable in case of no or little problem knowledge.<br />
- 13. Validation<br />A Simple Validation randomly splits up the example set into a training and test set and evaluates the model.<br />
- 14. More Questions?<br />Reach us at support@dataminingtools.net<br />Visit: www.dataminingtools.net<br />

No public clipboards found for this slide

Be the first to comment