SlideShare a Scribd company logo
INTRODUCTION TO R AND RATTLE
1IAUSHIRAZ1/14/2017
What is the R
Statistical Programming Language
used among statisticians and data miners for developing statistical software and data analysis.
Free and Open Source
Written in C, Fortran and R
Statistical features
Linear and nonlinear modeling
Statistical tests
Classification, Clustering
Can manipulate R Objects with C, C++, Java, .NET or Python code.
2IAUSHIRAZ1/14/2017
Source Example
> x <- c(1,2,3,4,5,6) # Create ordered collection (vector)
> y <- x^2 # Square the elements of x
> print(y) # print (vector) y
[1] 1 4 9 16 25 36
> mean(y) # Calculate average (arithmetic mean) of (vector) y; result is scalar
[1] 15.16667
> var(y) # Calculate sample variance
[1] 178.9667
> lm_1 <- lm(y ~ x) # Fit a linear regression model "y = f(x)" or "y = B0 + (B1 * x)"
# store the results as lm_1
> print(lm_1) # Print the model from the (linear model object) lm_1
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-9.333 7.000
> summary(lm_1) # Compute and print statistics for the fit
# of the (linear model object) lm_1
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5 6
3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.3333 2.8441 -3.282 0.030453 *
x 7.0000 0.7303 9.585 0.000662 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared: 0.9583, Adjusted R-squared: 0.9478
F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
> par(mfrow=c(2, 2)) # Request 2x2 plot layout
> plot(lm_1) # Diagnostic plot of regression model
3IAUSHIRAZ1/14/2017
Graphical front-ends
Architect – cross-platform open source IDE based on Eclipse and StatET
DataJoy – Online R Editor focused on beginners to data science and collaboration.
Deducer – GUI for menu-driven data analysis (similar to SPSS/JMP/Minitab).
Java GUI for R – cross-platform stand-alone R terminal and editor based on Java (also known as JGR).
Number Analytics - GUI for R based business analytics (similar to SPSS) working on the cloud.
Rattle GUI – cross-platform GUI based on RGtk2 and specifically designed for data mining.
R Commander – cross-platform menu-driven GUI based on tcltk (several plug-ins to Rcmdr are also
available).
Revolution R Productivity Environment (RPE) – Revolution Analytics-provided Visual Studio-based IDE,
and has plans for web based point and click interface.
RGUI – comes with the pre-compiled version of R for Microsoft Windows.
RKWard – extensible GUI and IDE for R.
RStudio – cross-platform open source IDE (which can also be run on a remote Linux server).
4IAUSHIRAZ1/14/2017
What is the Rattle
R Graphical User Interface Package
Offered by Graham Williams in Togaware Pty Ltd.
Free and Open Source
Represents Statistical and Visual Summaries of data
Tabs :
Load Data
Data Exploration
Model
Evaluation
Test
…
5IAUSHIRAZ1/14/2017
Rattle Installation Process
Download and Installing R
https://r-project.org
About 60MB
Download the Rattle Package
About 300MB
Follow Instructions :
 install.packages("rattle", dependencies=c("Depends", "Suggests"))
 Library(rattle)
 Rattle()
6IAUSHIRAZ1/14/2017
Load Data
Dataset Types :
CSV File (CSV, TXT, EXCELL)
ARFF (CSV File which adds type information)
ODBC (MySQL, SqlLITE, SQL Server, …)
 Set Connections in : /etc/odbcinst.ini & /etc/odbc.ini
R Dataset (Existing Datasets in Current Solution)
R Data File
Library (Pre Existing Datasets)
Corpus ( Collection of Documents)
Script (Scripts for Generating Datasets)
1/14/2017 IAUSHIRAZ 7
Load Data
Variable Types :
Input (Most Variables as Input)
 Predict the Target Variables
Target (Influenced by the Input Variables)
 Known as the Output
 Prefix : TARGET_
Risk (Measure of the size of the Targets)
 Prefix : RISK_
Identifier (any Numeric Variable that has a Unique Value – Not Normally used in modeling)
 Such as : ID, Date
 Prefix : ID_
Ignore (Ignore from Modeling)
 Prefix : IGNORE_
Weight (Weighted by R Formula)
1/14/2017 IAUSHIRAZ 8
Transform
Rescale
Normalize
 Re Center
 Scale [0-1]
 Median/Mad
 Natural Log / Log 10
 Matrix
Order
 Rank
 Interval
 Number of Group
1/14/2017 IAUSHIRAZ 9
Transform
Impute (missing values)
Zero
Mean
Median
Mode
Constant
Recode
Quantiles
K-Means
Equal with
Indicator variable / Join Categories
As Categorical / As Numeric
1/14/2017 IAUSHIRAZ 10
Transform
Cleanup
Delete Ignored
Delete Selected
Delete Missing
Delete Observations with Missing
1/14/2017 IAUSHIRAZ 11
Exploration
Summary
Summary
 Min, Max, Mean, Quartiles Values.
Describe
 Missing, Unique, Sum, Mean, Lowest, Highest Values.
Basics (For Numeric Value)
 Measures of Numeric Data (Missing, Min, Max, Quartiles, Mean, Sum, Skewness, Kurtosis)
Kurtosis (For Numeric Value)
 A larger value indicates a sharper peak.
 A lower value indicates a smoother peak.
Skewness (For Numeric Value)
 A positive skew indicates that the tail to the right is longer.
 A negative skew that the tail to the left is longer.
1/14/2017 IAUSHIRAZ 12
Exploration
Summary
Show Missing
 Each row corresponds to a pattern of missing values.
 Perhaps coming to an understanding of why the data is missing.
 Rows and Columns are sorted in ascending order of missing data.
1/14/2017 IAUSHIRAZ 13
Exploration
Distributions (review the distributions of each variable in dataset)
Annotate (include numeric values in plots)
Group by
Numeric Outputs :
 Box Plot
 Histogram
 Cumulative
 Benford
 For any number of continuous variables
 Pairs
Categorical Outputs :
 Bar Plot
 Dot Plot
 Mosaic
 Pairs
1/14/2017 IAUSHIRAZ 14
Exploration
Correlations (Rattle only computes correlations between numeric variables at this time)
Ordered
 Order by strength of correlations
Explore Missing
 Correlation between missing values
Hierarchical
 Pearson
 Kendall
 Spearman
Principal Components
SVD
 For only Numeric Variables
Eigen
1/14/2017 IAUSHIRAZ 15
Model
Tree
Traditional
 Trade off between performance and simplicity of explanation
Conditional
Forest (many decision trees using random subsets of data and variables)
Number of Trees
Number of Variables
Impute (set median numeric value for missing values)
Sample Size (for balancing classes)
Importance (variable importance)
Rules (collection of random forest rules)
ROC (ROC Curve)
Errors
1/14/2017 IAUSHIRAZ 16
Model
SVM
Start with two parallel vector
Linear (linear regression)
For continues values
All
1/14/2017 IAUSHIRAZ 17
Cluster
K-Means
Set First K
EwKm
K-Means with entropy weighting
Hierarchical
Not needed to set first Cluster Number
BiCluster
Suitable subsets of both the variables and the observations
1/14/2017 IAUSHIRAZ 18

More Related Content

What's hot

Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
Sander Timmer
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
ShareThis
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
AkshitaKanther
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
Yanchang Zhao
 
Data Structure
Data StructureData Structure
Data Structuresheraz1
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
Victoria López
 
R language
R languageR language
R language
Isra El Isa
 
R Get Started I
R Get Started IR Get Started I
R Get Started I
Sankhya_Analytics
 
R language
R languageR language
R language
LearningTech
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
Sankhya_Analytics
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Samuel Bosch
 
R language introduction
R language introductionR language introduction
R language introduction
Shashwat Shriparv
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
Sakthi Dasans
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 
A brief introduction to 'R' statistical package
A brief introduction to 'R' statistical packageA brief introduction to 'R' statistical package
A brief introduction to 'R' statistical package
Shanmukha S. Potti
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Motivation and Mechanics behind some aspects of Shapeless
Motivation and Mechanics behind some aspects of ShapelessMotivation and Mechanics behind some aspects of Shapeless
Motivation and Mechanics behind some aspects of Shapeless
Anatolii Kmetiuk
 
R training5
R training5R training5
R training5
Hellen Gakuruh
 
A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)
Imdadul Himu
 

What's hot (20)

Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Data Structure
Data StructureData Structure
Data Structure
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
R language
R languageR language
R language
 
R Get Started I
R Get Started IR Get Started I
R Get Started I
 
R language
R languageR language
R language
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R language introduction
R language introductionR language introduction
R language introduction
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
A brief introduction to 'R' statistical package
A brief introduction to 'R' statistical packageA brief introduction to 'R' statistical package
A brief introduction to 'R' statistical package
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Motivation and Mechanics behind some aspects of Shapeless
Motivation and Mechanics behind some aspects of ShapelessMotivation and Mechanics behind some aspects of Shapeless
Motivation and Mechanics behind some aspects of Shapeless
 
R training5
R training5R training5
R training5
 
A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)
 

Similar to Rattle Graphical Interface for R Language

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
Mahmoud Shiri Varamini
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
Chetan Khatri
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
Sascha Dittmann
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
Revolution Analytics
 
R basics
R basicsR basics
R basics
Sagun Baijal
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
RohanBorgalli
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
Long Nguyen
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
Dr. C.V. Suresh Babu
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
Cloudera, Inc.
 
Unit 3
Unit 3Unit 3
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
Massimo Schenone
 
User biglm
User biglmUser biglm
User biglm
johnatan pladott
 
Big Data Analytics Part2
Big Data Analytics Part2Big Data Analytics Part2
Big Data Analytics Part2
Sreedhar Chowdam
 
R programming slides
R  programming slidesR  programming slides
R programming slides
Pankaj Saini
 
R studio
R studio R studio
R studio
Kinza Irshad
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
DataWorks Summit
 

Similar to Rattle Graphical Interface for R Language (20)

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
R basics
R basicsR basics
R basics
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Unit 3
Unit 3Unit 3
Unit 3
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
User biglm
User biglmUser biglm
User biglm
 
Big Data Analytics Part2
Big Data Analytics Part2Big Data Analytics Part2
Big Data Analytics Part2
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
R studio
R studio R studio
R studio
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Rattle Graphical Interface for R Language

  • 1. INTRODUCTION TO R AND RATTLE 1IAUSHIRAZ1/14/2017
  • 2. What is the R Statistical Programming Language used among statisticians and data miners for developing statistical software and data analysis. Free and Open Source Written in C, Fortran and R Statistical features Linear and nonlinear modeling Statistical tests Classification, Clustering Can manipulate R Objects with C, C++, Java, .NET or Python code. 2IAUSHIRAZ1/14/2017
  • 3. Source Example > x <- c(1,2,3,4,5,6) # Create ordered collection (vector) > y <- x^2 # Square the elements of x > print(y) # print (vector) y [1] 1 4 9 16 25 36 > mean(y) # Calculate average (arithmetic mean) of (vector) y; result is scalar [1] 15.16667 > var(y) # Calculate sample variance [1] 178.9667 > lm_1 <- lm(y ~ x) # Fit a linear regression model "y = f(x)" or "y = B0 + (B1 * x)" # store the results as lm_1 > print(lm_1) # Print the model from the (linear model object) lm_1 Call: lm(formula = y ~ x) Coefficients: (Intercept) x -9.333 7.000 > summary(lm_1) # Compute and print statistics for the fit # of the (linear model object) lm_1 Call: lm(formula = y ~ x) Residuals: 1 2 3 4 5 6 3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.3333 2.8441 -3.282 0.030453 * x 7.0000 0.7303 9.585 0.000662 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.055 on 4 degrees of freedom Multiple R-squared: 0.9583, Adjusted R-squared: 0.9478 F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662 > par(mfrow=c(2, 2)) # Request 2x2 plot layout > plot(lm_1) # Diagnostic plot of regression model 3IAUSHIRAZ1/14/2017
  • 4. Graphical front-ends Architect – cross-platform open source IDE based on Eclipse and StatET DataJoy – Online R Editor focused on beginners to data science and collaboration. Deducer – GUI for menu-driven data analysis (similar to SPSS/JMP/Minitab). Java GUI for R – cross-platform stand-alone R terminal and editor based on Java (also known as JGR). Number Analytics - GUI for R based business analytics (similar to SPSS) working on the cloud. Rattle GUI – cross-platform GUI based on RGtk2 and specifically designed for data mining. R Commander – cross-platform menu-driven GUI based on tcltk (several plug-ins to Rcmdr are also available). Revolution R Productivity Environment (RPE) – Revolution Analytics-provided Visual Studio-based IDE, and has plans for web based point and click interface. RGUI – comes with the pre-compiled version of R for Microsoft Windows. RKWard – extensible GUI and IDE for R. RStudio – cross-platform open source IDE (which can also be run on a remote Linux server). 4IAUSHIRAZ1/14/2017
  • 5. What is the Rattle R Graphical User Interface Package Offered by Graham Williams in Togaware Pty Ltd. Free and Open Source Represents Statistical and Visual Summaries of data Tabs : Load Data Data Exploration Model Evaluation Test … 5IAUSHIRAZ1/14/2017
  • 6. Rattle Installation Process Download and Installing R https://r-project.org About 60MB Download the Rattle Package About 300MB Follow Instructions :  install.packages("rattle", dependencies=c("Depends", "Suggests"))  Library(rattle)  Rattle() 6IAUSHIRAZ1/14/2017
  • 7. Load Data Dataset Types : CSV File (CSV, TXT, EXCELL) ARFF (CSV File which adds type information) ODBC (MySQL, SqlLITE, SQL Server, …)  Set Connections in : /etc/odbcinst.ini & /etc/odbc.ini R Dataset (Existing Datasets in Current Solution) R Data File Library (Pre Existing Datasets) Corpus ( Collection of Documents) Script (Scripts for Generating Datasets) 1/14/2017 IAUSHIRAZ 7
  • 8. Load Data Variable Types : Input (Most Variables as Input)  Predict the Target Variables Target (Influenced by the Input Variables)  Known as the Output  Prefix : TARGET_ Risk (Measure of the size of the Targets)  Prefix : RISK_ Identifier (any Numeric Variable that has a Unique Value – Not Normally used in modeling)  Such as : ID, Date  Prefix : ID_ Ignore (Ignore from Modeling)  Prefix : IGNORE_ Weight (Weighted by R Formula) 1/14/2017 IAUSHIRAZ 8
  • 9. Transform Rescale Normalize  Re Center  Scale [0-1]  Median/Mad  Natural Log / Log 10  Matrix Order  Rank  Interval  Number of Group 1/14/2017 IAUSHIRAZ 9
  • 10. Transform Impute (missing values) Zero Mean Median Mode Constant Recode Quantiles K-Means Equal with Indicator variable / Join Categories As Categorical / As Numeric 1/14/2017 IAUSHIRAZ 10
  • 11. Transform Cleanup Delete Ignored Delete Selected Delete Missing Delete Observations with Missing 1/14/2017 IAUSHIRAZ 11
  • 12. Exploration Summary Summary  Min, Max, Mean, Quartiles Values. Describe  Missing, Unique, Sum, Mean, Lowest, Highest Values. Basics (For Numeric Value)  Measures of Numeric Data (Missing, Min, Max, Quartiles, Mean, Sum, Skewness, Kurtosis) Kurtosis (For Numeric Value)  A larger value indicates a sharper peak.  A lower value indicates a smoother peak. Skewness (For Numeric Value)  A positive skew indicates that the tail to the right is longer.  A negative skew that the tail to the left is longer. 1/14/2017 IAUSHIRAZ 12
  • 13. Exploration Summary Show Missing  Each row corresponds to a pattern of missing values.  Perhaps coming to an understanding of why the data is missing.  Rows and Columns are sorted in ascending order of missing data. 1/14/2017 IAUSHIRAZ 13
  • 14. Exploration Distributions (review the distributions of each variable in dataset) Annotate (include numeric values in plots) Group by Numeric Outputs :  Box Plot  Histogram  Cumulative  Benford  For any number of continuous variables  Pairs Categorical Outputs :  Bar Plot  Dot Plot  Mosaic  Pairs 1/14/2017 IAUSHIRAZ 14
  • 15. Exploration Correlations (Rattle only computes correlations between numeric variables at this time) Ordered  Order by strength of correlations Explore Missing  Correlation between missing values Hierarchical  Pearson  Kendall  Spearman Principal Components SVD  For only Numeric Variables Eigen 1/14/2017 IAUSHIRAZ 15
  • 16. Model Tree Traditional  Trade off between performance and simplicity of explanation Conditional Forest (many decision trees using random subsets of data and variables) Number of Trees Number of Variables Impute (set median numeric value for missing values) Sample Size (for balancing classes) Importance (variable importance) Rules (collection of random forest rules) ROC (ROC Curve) Errors 1/14/2017 IAUSHIRAZ 16
  • 17. Model SVM Start with two parallel vector Linear (linear regression) For continues values All 1/14/2017 IAUSHIRAZ 17
  • 18. Cluster K-Means Set First K EwKm K-Means with entropy weighting Hierarchical Not needed to set first Cluster Number BiCluster Suitable subsets of both the variables and the observations 1/14/2017 IAUSHIRAZ 18

Editor's Notes

  1. The intensity of the color is maximal for a perfect correlation, and minimal (white) if there is no correlation. Shades of red are used for negative correlations and blue for positive correlations.