Data Mining: Implementation of Data Mining Techniques using RapidMiner software

•Download as PPTX, PDF•

1 like•1,144 views

Mohammed Kharma

Data Mining: Implementation of Data Mining Techniques using RapidMiner software presentation

Data & Analytics Technology

Data Mining:
Implementation of Data
Mining Techniques using
RapidMiner software
Prepared by
Mohammed Kharma

Definitions review
• Cluster: A collection of data objects
– similar (or related) to one another within the
same group
– dissimilar (or unrelated) to the objects in other
groups
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters

Clustering Methods
• Partitioning :
– Unsupervised learning algorithms, Construct various
partitions and then evaluate them by some criterion,
e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids
• Hierarchical :
– Create a hierarchical decomposition of the set of data
(or objects) using some criterion
– Typical methods: Diana, Agnes, BIRCH, ROCK,
CAMELEON

Illustration & compression of 2
clustering technique using
Rapidminer tool and Java
application

illustrate of 2 clustering technique
using Rapidminer tool and Java
• K-means algorithm:
We performed two test
1. Using java program: program parameters
K = 2;
Data:
22 21
19 20
18 22
1 3
3 2

6
K-means Clustering
• Input: the number of clusters K and the collection of n
instances
• Output: a set of k clusters that minimizes the squared error
criterion
• Method:
– Arbitrarily choose k instances as the initial cluster centers
– Repeat
• (Re)assign each instance to the cluster to which the
instance is the most similar, based on the mean value of
the instances in the cluster
• Update cluster means (compute mean value of the
instances for each cluster)
– Until no change in the assignment
• Squared Error Criterion
– E = ∑i=1 k ∑ pЄCi |p-mi|2
– where mi are the cluster means and p are points in clusters

Continued-The result of K-Means-
RapidMiner

11
K-medoids
• Input: the number of clusters K and the collection of n
instances
• Output: A set of k clusters that minimizes the sum of the
dissimilarities of all the instances to their nearest medoids
• Method:
– Arbitrarily choose k instances as the initial medoids
– Repeat
• (Re)assign each remaining instance to the cluster with
the nearest medoid
• Randomly select a non-medoid instance, or
• Compute the total cost, S, of swapping Oj with Or
• If S<0 then swap Oj with Or to form the new set of k
medoids
– Until no change

Java Live Demo:
http://home.dei.polimi.it/matteucc/Clustering/t
utorial_html/AppletKM.html

Comparison
The results of both algorithms are the same
Both require K to be specified in the
input
K-medoids is less influenced by outliers in the
data
Both methods assign each instance exactly to
one cluster

What's hot

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf

CS267_Graph_LabJaideepKatkar

How to use Map() Filter() and Reduce() functions in Python | EdurekaEdureka!

Scaling out logistic regression with SparkBarak Gitsis

MLconf NYC Xiangrui MengMLconf

Java - CollectionsAmith jayasekara

Joey gonzalez, graph lab, m lconf 2013MLconf

Flexible Memory Allocation in Kinetic Monte Carlo SimulationsAaron Craig

Gelly in Apache Flink Bay Area MeetupVasia Kalavri

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka

ArrayIama Marsian

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf

Modelling Accessibility Performance in LTE networks, An Analytics Methodologyalien_gmx

Unsupervised Learning: Clustering Experfy

[ppt]butest

Quick and Heap Sort with examplesBst Ali

Generalized Linear Models with H2O Sri Ambati

Introduction to Data ScienceSridhara R

0415_seminar_DeepDPGHye-min Ahn

Heapsort quick sortDr Sandeep Kumar Poonia

What's hot (20)

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...

CS267_Graph_Lab

How to use Map() Filter() and Reduce() functions in Python | Edureka

Scaling out logistic regression with Spark

MLconf NYC Xiangrui Meng

Java - Collections

Joey gonzalez, graph lab, m lconf 2013

Flexible Memory Allocation in Kinetic Monte Carlo Simulations

Gelly in Apache Flink Bay Area Meetup

[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...

Array

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16

Modelling Accessibility Performance in LTE networks, An Analytics Methodology

Unsupervised Learning: Clustering

[ppt]

Quick and Heap Sort with examples

Generalized Linear Models with H2O

Introduction to Data Science

0415_seminar_DeepDPG

Heapsort quick sort

Viewers also liked

RapidminerGernot Schulmeister

Introduction to RapidMiner Studio V7geraldinegray

RapidMiner: Introduction To Rapid MinerRapidmining Content

Data mining toolssuganmca14

Slides PAPIs.io'14 RapidMinerSabrina Kirstein

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...Cloudera, Inc.

Data mining tools overallMohamed Sharique Vellikan

M Chambers and RapidMiner Overview for Babson classmcAnalytics99

RapidMiner, an entrance to explore MIMIC-III?Sven Van Poucke, MD, PhD

Data Analytics.01. Data selection and captureAlex Rayón Jerez

Predictive ModellingRajiv Advani

Predictive Modeling and Analytics select_chaptersJeffrey Strickland, Ph.D., CMSP

predictive modelsJeffrey Strickland, Ph.D., CMSP

Introduction to Text Classification with RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

My First Data Science Project (using Rapid Miner)Data Science Thailand

Search Twitter with RapidMiner Studio 6Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Predictive Analytics World Berlin 2016 Rising Media Ltd.

Introduction to predictive modeling v1Venkata Reddy Konasani

Advanced Predictive Modeling with R and RapidMiner Studio 7Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean

Viewers also liked (20)

Rapidminer

Introduction to RapidMiner Studio V7

RapidMiner: Introduction To Rapid Miner

Data mining tools

Slides PAPIs.io'14 RapidMiner

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...

Data mining tools overall

M Chambers and RapidMiner Overview for Babson class

RapidMiner, an entrance to explore MIMIC-III?

Data Analytics.01. Data selection and capture

Predictive Modelling

Predictive Modeling and Analytics select_chapters

predictive models

Introduction to Text Classification with RapidMiner Studio 7

My First Data Science Project (using Rapid Miner)

Search Twitter with RapidMiner Studio 6

Predictive Analytics World Berlin 2016

Introduction to predictive modeling v1

Advanced Predictive Modeling with R and RapidMiner Studio 7

Data Mining: Concepts and techniques classification _chapter 9 :advanced methods

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software

3.2 partitioning methodsKrish_ver2

Clustering TheorySSA KPI

Advanced database and data mining & clustering conceptsNithyananthSengottai

UNIT_V_Cluster Analysis.pptxsandeepsandy494692

Data mining techniques unit vmalathieswaran29

ClusetrigBasic.pptChaitanyaKulkarni451137

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf

Clustering on database systems rkmVahid Mirjalili

Dataa miiningSUBBIAH SURESH

26-Clustering MTech-2017.pptvikassingh569137

CSA 3702 machine learning module 3Nandhini S

machine learning - Clustering in RSudhakar Chavan

Unsupervised learning Modi.pptxssusere1fd42

K means clustering algorithmDarshak Mehta

Cluster Analysis.pptxRvishnupriya2

Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul

3.5 model based clusteringKrish_ver2

Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean

Knn 160904075605-convertedrameswara reddy venkat

Machine learning clusteringCosmoAIMS Bassett

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software (20)

3.2 partitioning methods

Clustering Theory

Advanced database and data mining & clustering concepts

UNIT_V_Cluster Analysis.pptx

Data mining techniques unit v

ClusetrigBasic.ppt

Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16

Clustering on database systems rkm

Dataa miining

26-Clustering MTech-2017.ppt

CSA 3702 machine learning module 3

machine learning - Clustering in R

Unsupervised learning Modi.pptx

K means clustering algorithm

Cluster Analysis.pptx

Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt

3.5 model based clustering

Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...

Knn 160904075605-converted

Machine learning clustering

Recently uploaded

Spark3's new memory model/managementakshesh doshi

Ukraine War presentation: KNOW THE BASICSAishani27

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Data Warehouse , Data Cube Computationsit20ad004

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

RadioAdProWritingCinderellabyButleri.pdfgstagge

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Recently uploaded (20)

Spark3's new memory model/management

Ukraine War presentation: KNOW THE BASICS

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Data Warehouse , Data Cube Computation

Schema on read is obsolete. Welcome metaprogramming..pdf

Unveiling Insights: The Role of a Data Analyst

Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

RadioAdProWritingCinderellabyButleri.pdf

04242024_CCC TUG_Joins and Relationships

100-Concepts-of-AI by Anupama Kate .pptx

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

E-Commerce Order PredictionShraddha Kamble.pptx

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Data Mining: Implementation of Data Mining Techniques using RapidMiner software

1. Data Mining: Implementation of Data Mining Techniques using RapidMiner software Prepared by Mohammed Kharma

2. Definitions review • Cluster: A collection of data objects – similar (or related) to one another within the same group – dissimilar (or unrelated) to the objects in other groups • Cluster analysis – Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters

3. Clustering Methods • Partitioning : – Unsupervised learning algorithms, Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods: k-means, k-medoids • Hierarchical : – Create a hierarchical decomposition of the set of data (or objects) using some criterion – Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON

4. Illustration & compression of 2 clustering technique using Rapidminer tool and Java application

5. illustrate of 2 clustering technique using Rapidminer tool and Java • K-means algorithm: We performed two test 1. Using java program: program parameters K = 2; Data: 22 21 19 20 18 22 1 3 3 2

6. 6 K-means Clustering • Input: the number of clusters K and the collection of n instances • Output: a set of k clusters that minimizes the squared error criterion • Method: – Arbitrarily choose k instances as the initial cluster centers – Repeat • (Re)assign each instance to the cluster to which the instance is the most similar, based on the mean value of the instances in the cluster • Update cluster means (compute mean value of the instances for each cluster) – Until no change in the assignment • Squared Error Criterion – E = ∑i=1 k ∑ pЄCi |p-mi|2 – where mi are the cluster means and p are points in clusters

7. The result K-Means-java program

8. The result of K-Means-RapidMiner

9. The result of K-Means-RapidMiner

10. Continued-The result of K-Means- RapidMiner

11. 11 K-medoids • Input: the number of clusters K and the collection of n instances • Output: A set of k clusters that minimizes the sum of the dissimilarities of all the instances to their nearest medoids • Method: – Arbitrarily choose k instances as the initial medoids – Repeat • (Re)assign each remaining instance to the cluster with the nearest medoid • Randomly select a non-medoid instance, or • Compute the total cost, S, of swapping Oj with Or • If S<0 then swap Oj with Or to form the new set of k medoids – Until no change

12. The result of k-medoids-RapidMiner

13. The result of k-medoids-RapidMiner

14. Java Live Demo: http://home.dei.polimi.it/matteucc/Clustering/t utorial_html/AppletKM.html

15. Comparison The results of both algorithms are the same Both require K to be specified in the input K-medoids is less influenced by outliers in the data Both methods assign each instance exactly to one cluster

16. »Thank you

Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software

Similar to Data Mining: Implementation of Data Mining Techniques using RapidMiner software (20)

More from Mohammed Kharma

More from Mohammed Kharma (8)

Recently uploaded

Recently uploaded (20)

Data Mining: Implementation of Data Mining Techniques using RapidMiner software