RAPIDMINER: Introduction To Rapidminer


Published on

RAPIDMINER: Introduction To Rapidminer

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

RAPIDMINER: Introduction To Rapidminer

  2. 2. AGENDA<br /><ul><li>What is Data Mining?
  3. 3. Introduction to RapidMiner
  4. 4. Use of RapidMiner for Data Mining
  5. 5. Download and Installation Steps
  6. 6. Memory Usage , Plug-ins & Settings
  7. 7. Supported File Formats</li></li></ul><li>What is Data Mining?<br />Process of analyzing data and extracting patterns from it<br />Four step process:<br />Classification – Arrange into predefined groups<br />Clustering – Similar to classification but the groups are not predefined<br />Regression – To find a function that models the data with least error<br />Association rule learning – Search for relationships <br />
  8. 8. Different levels of analysis that are available:<br />Artificial neural networks – Non-linear predictive models that resemble biological neural networks in structure.<br />Genetic algorithms - Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.<br />Decision trees – Provide a set of rules that you can apply to a new dataset to predict the outcome. <br /> Examples:<br /><ul><li>Classification and Regression Trees (CART)
  9. 9. Chi Square Automatic Interaction Detection (CHAID) . </li></ul>CART and CHAID are decision tree techniques used for classification of a dataset.<br />Rule induction – The extraction of useful if-then rules from data based on statistical significance.<br />Nearest neighbor – Classify records based on the k-most similar records<br />Data visualization - Visual interpretation of complex relationships in multidimensional data. <br />
  10. 10. Applications<br />Can be divided into four major kinds:<br />Classification<br />Numerical prediction<br />Association<br />Clustering<br />Some examples:<br />Automatic abstraction<br />Financial forecasting<br />Targeted marketing<br />Medical diagnosis<br />Credit card fraud detection<br />Weather forecasting etc.<br />
  11. 11. Introduction to RapidMiner<br />RapidMiner (formerly YALE*)is an environment for machine learning and data mining experiments. <br />RapidMiner is used for both research and real-world data mining tasks.<br />Software versions: <br /><ul><li>Community edition (open – source)
  12. 12. Enterprise edition </li></ul>(Community Edition + More Features + Services + Guarantees)<br />*YALE - Yet Another Learning Environment<br />
  13. 13. Some properties of RapidMiner:<br />Written in Java<br />Knowledge discovery processes are modelled as operator trees<br />Internal XML representation ensures standardized interchange format of data mining experiments<br />Scriptinglanguage allows for automating large-scale experiments<br />Multi-layered data view concept ensures efficient and transparent data handling<br />GUI, command-line mode (batch mode), and Java API for using RapidMiner from other programs<br />Several plugins already exist<br />A large set of high-dimensional visualization schemes for data and models offered by its plotting facility.<br />Applications: text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.<br />
  14. 14. Use of RapidMiner for Data Mining<br />Using RapidMiner<br /><ul><li>Process configuration provided as XML file
  15. 15. GUI can be used to design XML description of the operator tree
  16. 16. Break points can be used to check the intermediate results</li></ul>Use from a separate program<br />Command line version and Java API can be used to invoke RapidMiner in your programs without using the GUI<br />
  17. 17. Download and Installation Steps<br />Download<br />The latest version of RapidMiner can be downloaded from<br />http://rapid-i.com/content/blogsection/7/82/lang,en/<br />by selecting the appropriate version(Windows x86, x64 etc.) and RapidMiner edition<br />Installation<br />Windows executable<br />Download the windows executable (.exe) file<br />Double-click the rapidminer-xxx-instal.exe file to run it<br />Follow the instructions<br />
  18. 18. JAVA version(any platform)<br />Install JRE 1.5 or above (http://www.java.sun.com)<br />Choose the installation directory and unzip the downloaded file (.zip,.tar etc)<br /><ul><li>rapidminer-xxx-bin.zip – contains only the binaries
  19. 19. rapiminer-xxx-src.zip – contains binaries + source</li></li></ul><li>Memory Usage, Plugins & General Settings<br />Memory Usage<br />Complicated Greater memory<br />Data mining tasks needed<br />If memory requirements are not met, the entire process might halt.<br /><ul><li>Windows – amount of memory available is automatically calculated and properly set
  20. 20. Java version – amount of usable memory can be increased during installation.</li></li></ul><li>Plugins<br /><ul><li>Windows – Executable installer(.exe) to automatically install the plugin to its correct location
  21. 21. Java Version – Copy .jar file to lib/plugins subdirectory</li></ul>General Settings<br /><ul><li>Configuration files:</li></ul>Rapidminerrc<br />Rapidminerrc.OS (OS : Linux, Windows)<br /><ul><li>Locations scanned</li></ul>RapidMiner Home directory<br />.rapidminer in your home directory<br />Current working directory<br />File specified by rapidminer.rcfile<br />Order<br />
  22. 22. Supported File Formats<br />Can read data files, read & write models, parameter sets and attribute sets.<br />Most important – examples and instances<br />
  23. 23. Data files & attribute description files<br />ARFFEXAMPLESOURCE - .arff format<br />DATABASEEXAMPLESOURCE – To read from databases<br />SPARSEFORMATEXAMPLESOURCE<br />DENSEFORMATEXAMPLESOURCE<br />Attribute description file (.aml) in order to retrieve metadata about the instances<br />XML Attributes that can be set:<br />Name – unique name of the attribute<br />Sourcefile – name of the file containing the data(default used if not specified)<br />Sourcecol –column within the file(Starting from 1)<br />Sourcecol_end – sourcecol-sourcecol_end attributes are generated with the same properties.<br />Valuetype– one out of nominal,numeric, integer, real, ordered, binominal, polynominal and file_path<br />Blocktype – one out of single_value, value_series, value_series_start, value_series_end, interval, interval_start, interval_end<br />
  24. 24. Model files (.mod files)<br />Contains the models generated by previous runs<br />MODELWRITER – to write model files<br />MODELLOADER – to read model files<br />MODELAPPLIER – to apply model files<br />Attribute construction files (.att files)<br />ATTRIBUTECONSTRUCTIONWRITER – writes an attribute set<br />ATTRIBUTECONSTRUCTIONLOADER – reads an attribute set<br />Parameter set files (.par files)<br />GRIDPARAMETEROPTIMIZTION – generates a set of optimal parameters for a particular task<br />PARAMETERSETLOADER – use the parameter files<br />Attribute weight files (.wgt files)<br />Attibute selection is seen as attribute weighing which allows for more flexibility<br />ATTRIBUTEWEIGHTSWRITER – to write attribute weights to a file<br />ATTRIBUTEWEIGHTSLOADER – to read the attribute weights<br />ATTRIBUTEWEIGHTSAPPLIER – to apply in the example sets<br />
  25. 25. THANK YOU!<br />
  26. 26. More questions… <br /> Reach us at support@dataminingtools.net<br />VISIT: WWW.DATAMININGTOOLS.NET<br />