• Save
RAPIDMINER: Introduction To Datamining
Upcoming SlideShare
Loading in...5

Like this? Share it with your network

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 461

http://araditiya.blogspot.com 326
http://www.slideshare.net 82
http://www.businessmodeltoolset.com 38
http://oqueeufizhoje.blogspot.com 3
http://dataminingtools.net 3
http://www.araditiya.blogspot.com 2
http://www.dataminingtools.net 2
http://wpmaster 2
http://araditiya.blogspot.jp 1
http://us-w1.rockmelt.com 1
http://araditiya.blogspot.nl 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 2. AGENDA
    • What is Data Mining?
    • 3. Introduction to RapidMiner
    • 4. Use of RapidMiner for Data Mining
    • 5. Download and Installation Steps
    • 6. Memory Usage , Plug-ins & Settings
    • 7. Supported File Formats
  • What is Data Mining?
    Process of analyzing data and extracting patterns from it
    Four step process:
    Classification – Arrange into predefined groups
    Clustering – Similar to classification but the groups are not predefined
    Regression – To find a function that models the data with least error
    Association rule learning – Search for relationships
  • 8. Different levels of analysis that are available:
    Artificial neural networks – Non-linear predictive models that resemble biological neural networks in structure.
    Genetic algorithms - Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.
    Decision trees – Provide a set of rules that you can apply to a new dataset to predict the outcome.
    • Classification and Regression Trees (CART)
    • 9. Chi Square Automatic Interaction Detection (CHAID) .
    CART and CHAID are decision tree techniques used for classification of a dataset.
    Rule induction – The extraction of useful if-then rules from data based on statistical significance.
    Nearest neighbor – Classify records based on the k-most similar records
    Data visualization - Visual interpretation of complex relationships in multidimensional data.
  • 10. Applications
    Can be divided into four major kinds:
    Numerical prediction
    Some examples:
    Automatic abstraction
    Financial forecasting
    Targeted marketing
    Medical diagnosis
    Credit card fraud detection
    Weather forecasting etc.
  • 11. Introduction to RapidMiner
    RapidMiner (formerly YALE*)is an environment for machine learning and data mining experiments.
    RapidMiner is used for both research and real-world data mining tasks.
    Software versions:
    • Community edition (open – source)
    • 12. Enterprise edition
    (Community Edition + More Features + Services + Guarantees)
    *YALE - Yet Another Learning Environment
  • 13. Some properties of RapidMiner:
    Written in Java
    Knowledge discovery processes are modelled as operator trees
    Internal XML representation ensures standardized interchange format of data mining experiments
    Scriptinglanguage allows for automating large-scale experiments
    Multi-layered data view concept ensures efficient and transparent data handling
    GUI, command-line mode (batch mode), and Java API for using RapidMiner from other programs
    Several plugins already exist
    A large set of high-dimensional visualization schemes for data and models offered by its plotting facility.
    Applications: text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.
  • 14. Use of RapidMiner for Data Mining
    Using RapidMiner
    • Process configuration provided as XML file
    • 15. GUI can be used to design XML description of the operator tree
    • 16. Break points can be used to check the intermediate results
    Use from a separate program
    Command line version and Java API can be used to invoke RapidMiner in your programs without using the GUI
  • 17. Download and Installation Steps
    The latest version of RapidMiner can be downloaded from
    by selecting the appropriate version(Windows x86, x64 etc.) and RapidMiner edition
    Windows executable
    Download the windows executable (.exe) file
    Double-click the rapidminer-xxx-instal.exe file to run it
    Follow the instructions
  • 18. JAVA version(any platform)
    Install JRE 1.5 or above (http://www.java.sun.com)
    Choose the installation directory and unzip the downloaded file (.zip,.tar etc)
    • rapidminer-xxx-bin.zip – contains only the binaries
    • 19. rapiminer-xxx-src.zip – contains binaries + source
  • Memory Usage, Plugins & General Settings
    Memory Usage
    Complicated Greater memory
    Data mining tasks needed
    If memory requirements are not met, the entire process might halt.
    • Windows – amount of memory available is automatically calculated and properly set
    • 20. Java version – amount of usable memory can be increased during installation.
  • Plugins
    • Windows – Executable installer(.exe) to automatically install the plugin to its correct location
    • 21. Java Version – Copy .jar file to lib/plugins subdirectory
    General Settings
    • Configuration files:
    Rapidminerrc.OS (OS : Linux, Windows)
    • Locations scanned
    RapidMiner Home directory
    .rapidminer in your home directory
    Current working directory
    File specified by rapidminer.rcfile
  • 22. Supported File Formats
    Can read data files, read & write models, parameter sets and attribute sets.
    Most important – examples and instances
  • 23. Data files & attribute description files
    ARFFEXAMPLESOURCE - .arff format
    DATABASEEXAMPLESOURCE – To read from databases
    Attribute description file (.aml) in order to retrieve metadata about the instances
    XML Attributes that can be set:
    Name – unique name of the attribute
    Sourcefile – name of the file containing the data(default used if not specified)
    Sourcecol –column within the file(Starting from 1)
    Sourcecol_end – sourcecol-sourcecol_end attributes are generated with the same properties.
    Valuetype– one out of nominal,numeric, integer, real, ordered, binominal, polynominal and file_path
    Blocktype – one out of single_value, value_series, value_series_start, value_series_end, interval, interval_start, interval_end
  • 24. Model files (.mod files)
    Contains the models generated by previous runs
    MODELWRITER – to write model files
    MODELLOADER – to read model files
    MODELAPPLIER – to apply model files
    Attribute construction files (.att files)
    ATTRIBUTECONSTRUCTIONWRITER – writes an attribute set
    ATTRIBUTECONSTRUCTIONLOADER – reads an attribute set
    Parameter set files (.par files)
    GRIDPARAMETEROPTIMIZTION – generates a set of optimal parameters for a particular task
    PARAMETERSETLOADER – use the parameter files
    Attribute weight files (.wgt files)
    Attibute selection is seen as attribute weighing which allows for more flexibility
    ATTRIBUTEWEIGHTSWRITER – to write attribute weights to a file
    ATTRIBUTEWEIGHTSLOADER – to read the attribute weights
    ATTRIBUTEWEIGHTSAPPLIER – to apply in the example sets
  • 25. THANK YOU!
  • 26. More questions…
    Reach us at support@dataminingtools.net