SlideShare a Scribd company logo
Using R in Data mining for the masses,
Chapter 4
Ulrik Hørlyk Hjort
April 5, 2014
1 Modeling
In the following text “the book” will refer to the the book: “Data Mining for
the masses”
1.1 Correlation Matrix
The only execise in chapter 4 is how to create an correlation matrix on a
data set with with Rapidminer. This is easily done in R. Fiest we import
the data set for chapter 4:
data = read.csv(‘‘Chapter04DataSet.csv’’, sep=’’,’’,header = TRUE)
The we use the buildin cor() function in R to get the correlation matrix on
our data frame:
> cor(data)
Insulation Temperature Heating_Oil Num_Occupants Avg_Age
Insulation 1.00000000 -0.79369606 0.73609688 -0.01256684 0.64298171
Temperature -0.79369606 1.00000000 -0.77365974 0.01251864 -0.67257949
Heating_Oil 0.73609688 -0.77365974 1.00000000 -0.04163508 0.84789052
Num_Occupants -0.01256684 0.01251864 -0.04163508 1.00000000 -0.04803415
Avg_Age 0.64298171 -0.67257949 0.84789052 -0.04803415 1.00000000
Home_Size 0.20071164 -0.21393926 0.38119082 -0.02253438 0.30655725
Home_Size
Insulation 0.20071164
Temperature -0.21393926
Heating_Oil 0.38119082
Num_Occupants -0.02253438
Avg_Age 0.30655725
Home_Size 1.00000000
1
which is equal to the correlation matrix shown in Figure 4-4 in the book
2
1.2 Correlation plot
Execise 9 in chapter 4 shows an example of a scatter plot between the Insu-
lation and Heating Oil attributes. A equivalent scatter plot can be done in
R in the following way:
plot(data$Insulation,data$Heating_Oil,col="blue",type="p", pch=20, cex=.5)
Which generate the following plot:1
1
Use the R help function help(plot) to get an complete list and explanation of the
different arguments it takes
3
4
To prevent the overplotting we can add jitter to the x-axis:
plot(jitter(data$Insulation),data$Heating_Oil,col=’’blue’’,type=’’p’’, pch=20, cex=.5)
5
The plotting can be made more structured by first create a subset data frame
with the attributes we wish to plot:
dd <- data.frame(jitter(data$Insulation),data$Heating_Oil)
And then:
plot(dd)
1.3 3D Scatter plot
It is possible to make 3D scatter plots in R with the library “scatterplot3D”.
If the library is not installed on the system install it with the following
command in the R environment and follow the instrctions given.
install.packages(‘‘scatterplot3d’’)
When the library is installed import it with:
library(scatterplot3d)
And create a 3D scatter plot like the one at Figure 4-9 in the book:
scatterplot3d(data$Insulation,data$Heating_Oil,data$Temperature,pch=20,highlight.3d=T)
Which gives the following plot:
6
7

More Related Content

What's hot

computer notes - Data Structures - 30
computer notes - Data Structures - 30computer notes - Data Structures - 30
computer notes - Data Structures - 30ecomputernotes
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
samairaakram
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
Lemia Algmri
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
Amit Kundu
 
Computer notes - Analysis of Union
Computer notes  - Analysis of Union Computer notes  - Analysis of Union
Computer notes - Analysis of Union
ecomputernotes
 
4 heapsort pq
4 heapsort pq4 heapsort pq
4 heapsort pq
hasan Mohammad
 
Heaps
HeapsHeaps
Heap sort
Heap sortHeap sort
Heap sort
zahraa F.Muhsen
 
Heap Sort Algorithm
Heap Sort Algorithm Heap Sort Algorithm
Heap Sort Algorithm
Musaddiq Khan
 
Heapsort ppt
Heapsort pptHeapsort ppt
Heapsort ppt
Mariam Saeed
 
Heap sort
Heap sort Heap sort
DPLYR package in R
DPLYR package in RDPLYR package in R
DPLYR package in R
Bimba Pawar
 
Computer notes - Mergesort
Computer notes - MergesortComputer notes - Mergesort
Computer notes - Mergesort
ecomputernotes
 
Chapter 7.4
Chapter 7.4Chapter 7.4
Chapter 7.4sotlsoc
 
3.7 heap sort
3.7 heap sort3.7 heap sort
3.7 heap sort
Krish_ver2
 

What's hot (20)

computer notes - Data Structures - 30
computer notes - Data Structures - 30computer notes - Data Structures - 30
computer notes - Data Structures - 30
 
Heap sort
Heap sortHeap sort
Heap sort
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
Heap tree
Heap treeHeap tree
Heap tree
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Presentation on Heap Sort
Presentation on Heap Sort Presentation on Heap Sort
Presentation on Heap Sort
 
Computer notes - Analysis of Union
Computer notes  - Analysis of Union Computer notes  - Analysis of Union
Computer notes - Analysis of Union
 
4 heapsort pq
4 heapsort pq4 heapsort pq
4 heapsort pq
 
Heaps
HeapsHeaps
Heaps
 
Heaps
HeapsHeaps
Heaps
 
Heap sort
Heap sortHeap sort
Heap sort
 
Heap
HeapHeap
Heap
 
Heap Sort Algorithm
Heap Sort Algorithm Heap Sort Algorithm
Heap Sort Algorithm
 
Heapsort ppt
Heapsort pptHeapsort ppt
Heapsort ppt
 
Heap sort
Heap sort Heap sort
Heap sort
 
DPLYR package in R
DPLYR package in RDPLYR package in R
DPLYR package in R
 
Computer notes - Mergesort
Computer notes - MergesortComputer notes - Mergesort
Computer notes - Mergesort
 
Chapter 7.4
Chapter 7.4Chapter 7.4
Chapter 7.4
 
3.7 heap sort
3.7 heap sort3.7 heap sort
3.7 heap sort
 

Similar to Data mining for the masses, Chapter 4, Using R instead of Rapidminer

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
Dataconomy Media
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
Max Kleiner
 
Matlab data import
Matlab data importMatlab data import
Matlab data import
pramodkumar1804
 
R_Proficiency.pptx
R_Proficiency.pptxR_Proficiency.pptx
R_Proficiency.pptx
Shivammittal880395
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Yao Yao
 
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
Kavika Roy
 
Unit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptxUnit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
dataKarthik
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
Giovanna Roda
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Don't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code ReviewsDon't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code Reviews
Gramener
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
Florian Uhlitz
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
AbdurRazzaqe1
 
Bt0082 visual basic2
Bt0082 visual basic2Bt0082 visual basic2
Bt0082 visual basic2
Techglyphs
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
rahulsm27
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
Zurich_R_User_Group
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
Dataspora
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
Max Kleiner
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
Ummiya Mohammedi
 
R decision tree
R   decision treeR   decision tree
R decision tree
Learnbay Datascience
 

Similar to Data mining for the masses, Chapter 4, Using R instead of Rapidminer (20)

DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
DN 2017 | Multi-Paradigm Data Science - On the many dimensions of Knowledge D...
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
 
Matlab data import
Matlab data importMatlab data import
Matlab data import
 
R_Proficiency.pptx
R_Proficiency.pptxR_Proficiency.pptx
R_Proficiency.pptx
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
Unraveling The Meaning From COVID-19 Dataset Using Python – A Tutorial for be...
 
Unit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptxUnit 4_Working with Graphs _python (2).pptx
Unit 4_Working with Graphs _python (2).pptx
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Don't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code ReviewsDon't Repeat Yourself, and Automated Code Reviews
Don't Repeat Yourself, and Automated Code Reviews
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Lesson 2 data preprocessing
Lesson 2   data preprocessingLesson 2   data preprocessing
Lesson 2 data preprocessing
 
Bt0082 visual basic2
Bt0082 visual basic2Bt0082 visual basic2
Bt0082 visual basic2
 
ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066ITB Term Paper - 10BM60066
ITB Term Paper - 10BM60066
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
R decision tree
R   decision treeR   decision tree
R decision tree
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 

Data mining for the masses, Chapter 4, Using R instead of Rapidminer

  • 1. Using R in Data mining for the masses, Chapter 4 Ulrik Hørlyk Hjort April 5, 2014 1 Modeling In the following text “the book” will refer to the the book: “Data Mining for the masses” 1.1 Correlation Matrix The only execise in chapter 4 is how to create an correlation matrix on a data set with with Rapidminer. This is easily done in R. Fiest we import the data set for chapter 4: data = read.csv(‘‘Chapter04DataSet.csv’’, sep=’’,’’,header = TRUE) The we use the buildin cor() function in R to get the correlation matrix on our data frame: > cor(data) Insulation Temperature Heating_Oil Num_Occupants Avg_Age Insulation 1.00000000 -0.79369606 0.73609688 -0.01256684 0.64298171 Temperature -0.79369606 1.00000000 -0.77365974 0.01251864 -0.67257949 Heating_Oil 0.73609688 -0.77365974 1.00000000 -0.04163508 0.84789052 Num_Occupants -0.01256684 0.01251864 -0.04163508 1.00000000 -0.04803415 Avg_Age 0.64298171 -0.67257949 0.84789052 -0.04803415 1.00000000 Home_Size 0.20071164 -0.21393926 0.38119082 -0.02253438 0.30655725 Home_Size Insulation 0.20071164 Temperature -0.21393926 Heating_Oil 0.38119082 Num_Occupants -0.02253438 Avg_Age 0.30655725 Home_Size 1.00000000 1
  • 2. which is equal to the correlation matrix shown in Figure 4-4 in the book 2
  • 3. 1.2 Correlation plot Execise 9 in chapter 4 shows an example of a scatter plot between the Insu- lation and Heating Oil attributes. A equivalent scatter plot can be done in R in the following way: plot(data$Insulation,data$Heating_Oil,col="blue",type="p", pch=20, cex=.5) Which generate the following plot:1 1 Use the R help function help(plot) to get an complete list and explanation of the different arguments it takes 3
  • 4. 4
  • 5. To prevent the overplotting we can add jitter to the x-axis: plot(jitter(data$Insulation),data$Heating_Oil,col=’’blue’’,type=’’p’’, pch=20, cex=.5) 5
  • 6. The plotting can be made more structured by first create a subset data frame with the attributes we wish to plot: dd <- data.frame(jitter(data$Insulation),data$Heating_Oil) And then: plot(dd) 1.3 3D Scatter plot It is possible to make 3D scatter plots in R with the library “scatterplot3D”. If the library is not installed on the system install it with the following command in the R environment and follow the instrctions given. install.packages(‘‘scatterplot3d’’) When the library is installed import it with: library(scatterplot3d) And create a 3D scatter plot like the one at Figure 4-9 in the book: scatterplot3d(data$Insulation,data$Heating_Oil,data$Temperature,pch=20,highlight.3d=T) Which gives the following plot: 6
  • 7. 7