RabidMiner
Introduction
What’s
RapidMiner ?
• RapidMiner is a Learning Environment, an
environment for machine learning, data
mining, text mining, predictive analytics, and
business analytics. It is used for research,
education, training, rapid prototyping,
application development, and industrial
applications.
• RapidMiner is one of the most popular
Machine Learning and Data Analytics
environments in the market. Thanks to its
newly introduced features, users can do basic
to advanced machine learning in Rapidminer.
why
RapidMiner ?
• The amount of data that is generated
and harvested is increasing. This can
be clearly seen in the Internet of
Things. This increase in data volume
brings new challenges to analysts and
data optimization professionals. The
pace of development is increasing,
and to help with that, there are data
analytics and machine learning tools.
The interface of the Design View of
RapidMiner Studio.
Design View (and Result View)
•You can select between the Design View and Results
View which are the most used views in this
application. Design view is where you can import and
prepare data, build and validate models, and apply
models. Results View is where you can view the
results of the process you built in the Design View
with data tables, statistics or visualizations.
RapidMiner Tabs ‘Design’.
• RapidMiner Design contains four major panels by default,
including Repository, Operators, Process and Parameters.
• Repository panel includes datasets to be fed for analysis.
• Operators panel contains more than 500 operators
altogether for various tasks of professional data analysis.
• Process panel displays the designed data analysing processes
to be created and managed.
• Parameters panel allows users to set parameters for the
selected processes.
RapidMiner Design
Data pre-processing
Add dataset to Repositories
• ‘Titanic Unlabeled ’ dataset is
used for this presentation:
• Import it into the
Repositories.
Import data
Editing dataset
1. Right click
on dataset
and then
choose edit .
2. Right click on one
attribute and then choose
Modify Attribute.
Editing dataset
. 3. Select Label as the role of this attribute, which means
this attribute will be recognized as the dependent
variable
4. Save the change in the
dataset.
Note: Using this method will permanently Label the selected attribute
in this dataset until it was changed back.
After saving the dataset,
simply drag the dataset into
the Process panel
The dataset will appear in
the Process panel, then
drag a cable from the ‘out’
of the dataset, to the ‘res’
of the panel.
Lastly, click on the ‘Run’.
This cable means that the
output of this module is the
result of the designed data
process.
Editing dataset
The software automatically
turns to the Results interface
The ‘make’ attribute is highlighted,
which means it has been Labelled.
data cleaning
Some operators to cleaning the data
Sorting
operator
before After
replace
operator
filter
operator
Replace
Missing
Values
operator
Discretize
operator
Auto Model
Datamining Method
Selection
•The details from a chosen dataset will be displayed. The cars
sales is a dataset to predict whether the customers will buy
the car from the available input parameters. There are eleven
input (x) parameters and one label (y) from this dataset.
•There are three actions that we can choose for our dataset.
Predict, clusters and outliers. The outlier's button will help us
detect outliers in our data. Clusters will help us detect
common groups in our data. Predict will classify the data from
the given input parameter.
•Here we can observe our dataset’s input parameter.
•Select the predict button to do classification, select the
Survived column as a label or classification target, and click on
the Next button.
Importing Dataset
Data
Balance
The Quality column will
help us make a
decision. It consists of
five important
parameters CISMT.
Correlation
(C) : measures the
linear correlation
between the data
column and the target
column.
ID-ness (I) : measures
the likelihood of the
column resemble an ID.
Stability (S) : Indicates
that nearly all values
are identical.
Missing (M) : measures
how much missing
value is in the column.
Text-ness (T) : measures
the likelihood of the
column resemble free
text.
• Algorithm Selection
• Here’s the algorithm selection. Rapidminer will serve several popular
classification algorithms for us to choose from.
• This is the list of algorithms you can choose:
1.Naive Bayes
2.Generalized Linear Model
3.Logistic Regression
4.Fast Large Margin
5.Deep Learning
6.Decision Tree
7.Random Forest
8.Gradient Boosted Trees
9.Support Vector Machine
• We can choose all of them If
the dataset being used is
small. But we need to be wise
when using a large dataset.
Because the more algorithm
being selected, the more time
and hardware resources will
be needed.
• After choosing them, click on
the run button.
• Getting Insights from the
Result
• Depending on how many
algorithms have been chosen, it will
take longer to process. After we’re
waiting for a while, the results will
be served. The results will appear
as a table and charts.
• Getting Insights from the
Result
• See the Deep learning, it has two
badges. The badges show that deep
learning gets the best overall
performance and the best low-cost
computation.
• Exporting Result
• Imagine if we have to prepare our data manually and create the
classification code with a deep learning algorithm. That one itself
must be taken so many hours to code. It’s only one algorithm, how
about coding all of them and create the visualizations.
• In just ten minutes we have already finished our data mining
process without the fuss to code them from the ground up. All we
have to do is to click the Next button and finish.
• Exporting Result
• We also can save the result in various formats.
Excel is one of them. Click the export button
from the previous dialog and just click on the
desired format and finished.
• Conclusion
• No-code development platforms can greatly simplify data mining works
• Rapidminer is one of the tools that are effective for data mining tasks and safe a lot of
times.
• Rapidminer also includes data pre-processing and algorithms selection
• At the end of the task, rapidminer will serve visualizations for us to get an insight.
• All the tasks done in rapidminer are so effortless compared to manual coding.
Big thanks for
dr / marian mamdouh
for the great effort

rabidminer_Teamddsfa dfasdfasd fadfas.pptx

  • 1.
  • 2.
    What’s RapidMiner ? • RapidMineris a Learning Environment, an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications. • RapidMiner is one of the most popular Machine Learning and Data Analytics environments in the market. Thanks to its newly introduced features, users can do basic to advanced machine learning in Rapidminer.
  • 3.
    why RapidMiner ? • Theamount of data that is generated and harvested is increasing. This can be clearly seen in the Internet of Things. This increase in data volume brings new challenges to analysts and data optimization professionals. The pace of development is increasing, and to help with that, there are data analytics and machine learning tools.
  • 4.
    The interface ofthe Design View of RapidMiner Studio.
  • 5.
    Design View (andResult View) •You can select between the Design View and Results View which are the most used views in this application. Design view is where you can import and prepare data, build and validate models, and apply models. Results View is where you can view the results of the process you built in the Design View with data tables, statistics or visualizations.
  • 6.
    RapidMiner Tabs ‘Design’. •RapidMiner Design contains four major panels by default, including Repository, Operators, Process and Parameters. • Repository panel includes datasets to be fed for analysis. • Operators panel contains more than 500 operators altogether for various tasks of professional data analysis. • Process panel displays the designed data analysing processes to be created and managed. • Parameters panel allows users to set parameters for the selected processes.
  • 7.
  • 8.
  • 9.
    Add dataset toRepositories • ‘Titanic Unlabeled ’ dataset is used for this presentation: • Import it into the Repositories. Import data
  • 10.
    Editing dataset 1. Rightclick on dataset and then choose edit . 2. Right click on one attribute and then choose Modify Attribute.
  • 11.
    Editing dataset . 3.Select Label as the role of this attribute, which means this attribute will be recognized as the dependent variable 4. Save the change in the dataset. Note: Using this method will permanently Label the selected attribute in this dataset until it was changed back.
  • 12.
    After saving thedataset, simply drag the dataset into the Process panel The dataset will appear in the Process panel, then drag a cable from the ‘out’ of the dataset, to the ‘res’ of the panel. Lastly, click on the ‘Run’. This cable means that the output of this module is the result of the designed data process.
  • 13.
    Editing dataset The softwareautomatically turns to the Results interface The ‘make’ attribute is highlighted, which means it has been Labelled.
  • 14.
  • 15.
    Some operators tocleaning the data
  • 16.
  • 17.
  • 18.
  • 20.
  • 23.
  • 25.
  • 27.
  • 28.
    Datamining Method Selection •The detailsfrom a chosen dataset will be displayed. The cars sales is a dataset to predict whether the customers will buy the car from the available input parameters. There are eleven input (x) parameters and one label (y) from this dataset. •There are three actions that we can choose for our dataset. Predict, clusters and outliers. The outlier's button will help us detect outliers in our data. Clusters will help us detect common groups in our data. Predict will classify the data from the given input parameter. •Here we can observe our dataset’s input parameter. •Select the predict button to do classification, select the Survived column as a label or classification target, and click on the Next button.
  • 29.
  • 31.
  • 33.
    The Quality columnwill help us make a decision. It consists of five important parameters CISMT. Correlation (C) : measures the linear correlation between the data column and the target column. ID-ness (I) : measures the likelihood of the column resemble an ID. Stability (S) : Indicates that nearly all values are identical. Missing (M) : measures how much missing value is in the column. Text-ness (T) : measures the likelihood of the column resemble free text.
  • 34.
    • Algorithm Selection •Here’s the algorithm selection. Rapidminer will serve several popular classification algorithms for us to choose from. • This is the list of algorithms you can choose: 1.Naive Bayes 2.Generalized Linear Model 3.Logistic Regression 4.Fast Large Margin 5.Deep Learning 6.Decision Tree 7.Random Forest 8.Gradient Boosted Trees 9.Support Vector Machine
  • 35.
    • We canchoose all of them If the dataset being used is small. But we need to be wise when using a large dataset. Because the more algorithm being selected, the more time and hardware resources will be needed. • After choosing them, click on the run button.
  • 36.
    • Getting Insightsfrom the Result • Depending on how many algorithms have been chosen, it will take longer to process. After we’re waiting for a while, the results will be served. The results will appear as a table and charts.
  • 37.
    • Getting Insightsfrom the Result • See the Deep learning, it has two badges. The badges show that deep learning gets the best overall performance and the best low-cost computation.
  • 38.
    • Exporting Result •Imagine if we have to prepare our data manually and create the classification code with a deep learning algorithm. That one itself must be taken so many hours to code. It’s only one algorithm, how about coding all of them and create the visualizations. • In just ten minutes we have already finished our data mining process without the fuss to code them from the ground up. All we have to do is to click the Next button and finish.
  • 39.
    • Exporting Result •We also can save the result in various formats. Excel is one of them. Click the export button from the previous dialog and just click on the desired format and finished.
  • 40.
    • Conclusion • No-codedevelopment platforms can greatly simplify data mining works • Rapidminer is one of the tools that are effective for data mining tasks and safe a lot of times. • Rapidminer also includes data pre-processing and algorithms selection • At the end of the task, rapidminer will serve visualizations for us to get an insight. • All the tasks done in rapidminer are so effortless compared to manual coding.
  • 41.
    Big thanks for dr/ marian mamdouh for the great effort