SlideShare a Scribd company logo
Hands-on Data Science with R
Presented by: Nimrita Koul
Title: Assistant Professor
School: School of Computing & IT
Affiliation: REVA University, Bengaluru
Agenda
Pre Lunch Session: R Fundamentals
1.1 What is R? RStudio?
1.2 Installing R and RStudio
1.3 Data Types and Structures – vector, matrix, data frames, list
1.4 Control Flow
1.5 Data Import and Export
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Agenda…
Post Lunch Session: Data Analysis with R
2.1 Working with iris data set
---2.1.1 Exploring individual variables of iris dataset
---2.1.2 Plotting Histogram, scatter plot, table, pie chart
---2.1.3 scatterplot3d
---2.1.4 contour
---2.1.5 persp
---2.1.6Saving charts to files
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Agenda
2.2 Classification
2.2.1 Decision Trees
2.3 Regression - Linear Regression
2.4 Clustering
2.4.1 K means clustering
2.5 Association Rule mining on Titanic Dataset using Apriori
Algorithm
2.6 Sentiment Analysis of Twitter Text
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Session 1
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
What is R?
R is a free software environment for statistical computing and
graphics.
R can be easily extended with 9,200+ packages available on CRAN (as
of Oct 2016).
http://www.r-project.org/
http://cran.r-project.org/
http://www.bioconductor.org/
http://cran.r-project.org/manuals.html
A great and complete documentation on R is available at:
http://cran.r-project.org/doc/manuals/R-intro.pdf
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Installing R
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Installing R
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Completing R
Installation
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
What is RStudio
It is an integrated development environment (IDE) for R
It runs on various operating systems like Windows, Mac OS X and Linux
RStudio project has following folders for default content types:
code: source code
data: raw data, cleaned data
figures: charts and graphs
 docs: documents and reports
 models: analytics models
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Installing RStudio
Follow the steps below for installing R Studio:
 1. Go to https://www.rstudio.com/products/rstudio/download/
 2. In ‘Installers for Supported Platforms’ section, choose and
click the
R Studio installer based on your operating system.
The download should begin as soon as you click.
3. Click Next..Next..Finish.
Download Complete.
To Start R Studio, click on its desktop icon
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Installing R Studio
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Installing R Studio
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
RStudio window looks
like this
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Data Types & Structures
Data types
 Integer
Numeric
Character
Factor
Logical
Data structures
Vector
Matrix
Data frame
List
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Vector and vector
arithmetic
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Matrix
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Data Frame
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
List
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Control Flow…for loop
for(i in 1:5)
print(1:i)
for(i in 1:5)
print(2:i)
for(i in 1:5)
print(3:i)
for(i in 1:5)
print(4:i)
for(i in 1:5)
print(5:i)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Control Flow .. If
statement
x <- 1:15
if (sample(x, 2) <= 10) {
print("x is less than 10")
} else
{
print("x is greater than 10")
}
barplot(table(sample(1:3, size=1000, replace=TRUE,
prob=c(.30,.60,.10))))
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Data Import and Export
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Save and Load R
Objects
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Import from and Export
to .csv Files
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Session 2 : Data Analysis
with R
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Working with iris data
set
We use iris dataset for demonstration of data exploration with R. See
The dimension and names of data can be obtained respectively with
dim() and names(). Functions str() and attributes() return the structure
and attributes of data.
Dim(iris)
Names(iris)
Str(iris)
Attributes(iris)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Structure of iris dataset
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Working with iris dataset
>iris[1:5] ----to print all 5 columns of iris data set
>iris[1:2] --- to print first two columns of iris data set
>iris[5] – to print only 5th
column of iris data set
>head(iris) --- first five records of iris data set
>tail(iris) --- last five records of iris data set
>iris[1:10, "Sepal.Length"]---- first 10 values in column Sepal.Length
> iris$Sepal.Length[1:10]-----same as above
Individual variables with summary(iris)
>summary(iris)
>quantile(iris$Sepal.Length)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Visualization & Graphics
Histogram
> hist(iris$Sepal.Length)
Density Plot
> plot(density(iris$Sepal.Length))
Table
> table(iris$Species)
Pie Chart
> pie(table(iris$Species))
Box Plot
>boxplot(Sepal.Length~Species,data=iris)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Visualization and
Graphics
Scatter plot
A scatter plot can be drawn for two numeric variables with plot() as
below. Using function with(), we don't need to add “iris$" before
variable names. In the code below, the colors (col) and symbols (pch) of
points are set to Species.
> with(iris, plot(Sepal.Length, Sepal.Width, col=Species,
pch=as.numeric(Species)))
When there are many points, some of them may overlap. We can use
jitter() to add a small amount of noise to the data before plotting
>plot(jitter(iris$Sepal.Length), jitter(iris$Sepal.Width))
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
3d scatter plot and interactive3d
plots
Install.packages(“scatterplot3d”)
A 3D scatter plot can be produced with package scatterplot3d
We first need to install package scatterplot3d and load it using
library(scatterplot3d)
> library(scatterplot3d)
> scatterplot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width)
For interactive 3d plot
>library(rgl)
>plot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Heatmap and Contour
plot
A heat map presents a 2D display of a data matrix, which can be
generated with heatmap() in R. We calculate the similarity between
different lowers in the iris data with dist() and then plot it with a heat
map.
> distMatrix <- as.matrix(dist(iris[,1:4]))
> heatmap(distMatrix)
Contour plot needs lattice library
>library(lattice)
>filled.contour(volcano, color=terrain.colors, asp=1,
+ plot.axes=contour(volcano, add=T))
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Advanced graphics with
ggplot2
Perspective -persp
Another way to illustrate a numeric matrix is a 3D surface plot shown as
below, which is generated with function persp().
> persp(volcano, theta=25, phi=30, expand=0.5, col="lightblue")
Package ggplot2 supports complex graphics, which are very useful for
exploring data
>library(ggplot2)
> qplot(Sepal.Length, Sepal.Width, data=iris, facets=Species ~.)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Saving charts to pdf
>pdf("myplot.pdf")
>x=1:50
>plot(x,log(x))
>graphics.off()
# file myplot.pdf will contain the resulting plot.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Covariance and
Correlation
> cov(iris$Sepal.Length, iris$Petal.Length)
>cov(iris[,1:4])
> cor(iris$Sepal.Length, iris$Petal.Length)
> cor(iris[,1:4])
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Classification with Decision
Trees
We can build a decision tree for the iris data with function ctree() in
package “party”. Sepal.Length, Sepal.Width, Petal.Length and
Petal.Width are used to predict the Species of flowers.
In the package, function ctree() builds a decision tree, and predict()
makes prediction for new data.
Before modeling, the iris data is split below into two subsets: training
(70%) and test (30%).
The random seed is set to a fixed value below to make the results
reproducible
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Decision Trees
> set.seed(1234)
> ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.7, 0.3))
> trainData <- iris[ind==1,]
> testData <- iris[ind==2,]
> library(party) #party package is required to create decision trees
> myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length +
Petal.Width
> iris_ctree <- ctree(myFormula, data=trainData)
> # check the prediction
> table(predict(iris_ctree), trainData$Species)
> print(iris_ctree)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Linear Regression
Regression is to build a function of independent variables (also known
as predictors) to predict a dependent variable (also called response).
For example, banks assess the risk of home-loan applicants based on
their age, income, expenses, occupation, number of dependents, total
credit limit, etc.
y = c0 + c1x1 + c2x2 + + ckxk;
where x1; x2; ; xk are predictors and y is the response to predict.
We demonstrate linear regression using function lm() on Australian
Consumer Price Index(cpi) data , quarterly from 2008 to 2010.
Data is created and plotted, x axis is manually added with function
axis(), las=3 makes text vertical.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Linear Regression
>year=rep(2008:2010,each=4)
> quarter=rep(1:4,3)
>
cpi=c(162.2,164.6,166.5,166.0,166.2,167.0,168.6,169.5,171.0,172.1,17
3.3,174.0)
> plot(cpi,xaxt="n",ylab="CPI",xlab="")
#draw x axis
> axis(1,labels=paste(year,quarter,sep="Q"),at=1:12,las=3)
> cor(year,cpi)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Checking correlation
and plotting 3d
> cor(quarter,cpi)
> fit<-lm(cpi~year+quarter)
> fit
>plot(fit)
>attributes(fit)
>library(scatterplot3d)
>s3d<-
scatterplot3d(year,quarter,cpi,highlight.3d=T,type=“h”,lab=c(2,3))
>s3d$plane3d(fit)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Linear Regression
With the above linear model, CPI is calculated as
cpi = c0 + c1 year + c2 quarter;
where c0, c1 and c2 are coefficients from model fit.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Clustering- K - Means
We remove species from the data to cluster, thenapply function
kmeans() to iris2, and store the clustering result in kmeans.result. The
cluster number is set to 3 in the code below.
>iris2=iris
>iris2$Species=NULL
>(kmeans.result=kmeans(iris2,3))
>table(iris$Species,kmeans.result$cluster)
The above result shows that cluster setosa" can be easily separated
from the other clusters, and that clusters versicolor" and virginica"
are to a small degree overlapped with each other.
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Plotting clusters
> plot(iris2[c("Sepal.Length", "Sepal.Width")], col =
kmeans.result$cluster)
# plot cluster centers
> points(kmeans.result$centers[,c("Sepal.Length", "Sepal.Width")], col
= 1:3, pch = 8, cex=2)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Association Rule Mining
on Titanic dataset
Attributes - Passenger class, Gender, Child, Survived
> str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
> df <- as.data.frame(Titanic)
> head(df)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Titanic Dataset
Class Sex Age Survived Freq
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
6 2nd Female Child No 0
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Titanic Raw
> titanic.raw <- NULL
> for(i in 1:4) {
+ titanic.raw <- cbind(titanic.raw, rep(as.character(df[,i]), df$Freq))
+ }
> titanic.raw <- as.data.frame(titanic.raw)
> names(titanic.raw) <- names(df)[1:4]
> str(titanic.raw)
> dim(titanic.raw)
> str(titanic.raw)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Titanic Raw
> head(titanic.raw)
> summary(titanic.raw)
> library(arules)
> # find association rules with default settings
> rules.all <- apriori(titanic.raw)
> rules.all
>inspect(rules.all)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Apriori Algorithm
>rules <- apriori(titanic.raw, control = list(verbose=F),
+ parameter = list(minlen=2, supp=0.005, conf=0.8),
+ appearance = list(rhs=c("Survived=No", "Survived=Yes"),
+ default="lhs"))
> quality(rules) <- round(quality(rules), digits=3)
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Removing redundant
rules
> subset.matrix <- is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
> redundant <- colSums(subset.matrix, na.rm=T) >= 1
> which(redundant)
> # remove redundant rules
> rules.pruned <- rules.sorted[!redundant]
> inspect(rules.pruned)
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
Visualizing Association
Rules
> library(arulesViz)
> plot(rules.all)
> plot(rules.all, method="grouped")
> plot(rules.all, method="graph")
> plot(rules.all, method="graph", control=list(type="items"))
HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY
MANY OBSTACLES OF YOUR PATH.
THANK YOU

More Related Content

What's hot

Sparklyr
SparklyrSparklyr
Stata cheat sheet: data transformation
Stata  cheat sheet: data transformationStata  cheat sheet: data transformation
Stata cheat sheet: data transformation
Tim Essam
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
Tim Essam
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
Florian Uhlitz
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
Abhik Seal
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
Lambda Tree
 
Stata cheatsheet transformation
Stata cheatsheet transformationStata cheatsheet transformation
Stata cheatsheet transformation
Laura Hughes
 
Data handling in r
Data handling in rData handling in r
Data handling in r
Abhik Seal
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
Laura Hughes
 
Most useful queries
Most useful queriesMost useful queries
Most useful queries
Sam Depp
 
Data transformation-cheatsheet
Data transformation-cheatsheetData transformation-cheatsheet
Data transformation-cheatsheet
Dieudonne Nahigombeye
 
R gráfico
R gráficoR gráfico
R gráfico
stryper1968
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
Sakthi Dasans
 
R intro 20140716-advance
R intro 20140716-advanceR intro 20140716-advance
R intro 20140716-advance
Kevin Chun-Hsien Hsu
 
Pandas
PandasPandas
Pandas
maikroeder
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
Rsquared Academy
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
Muhammad Nabi Ahmad
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 

What's hot (20)

Sparklyr
SparklyrSparklyr
Sparklyr
 
Stata cheat sheet: data transformation
Stata  cheat sheet: data transformationStata  cheat sheet: data transformation
Stata cheat sheet: data transformation
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Stata cheatsheet transformation
Stata cheatsheet transformationStata cheatsheet transformation
Stata cheatsheet transformation
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Most useful queries
Most useful queriesMost useful queries
Most useful queries
 
Data transformation-cheatsheet
Data transformation-cheatsheetData transformation-cheatsheet
Data transformation-cheatsheet
 
R gráfico
R gráficoR gráfico
R gráfico
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
R intro 20140716-advance
R intro 20140716-advanceR intro 20140716-advance
R intro 20140716-advance
 
Pandas
PandasPandas
Pandas
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 

Similar to Hands on data science with r.pptx

Hadoop
HadoopHadoop
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
ArchishaKhandareSS20
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
vikassingh569137
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
Sandeep242951
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
anshikagoel52
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
Alexandros Karatzoglou
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
The Statistical and Applied Mathematical Sciences Institute
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
BusyBird2
 
R basics
R basicsR basics
R basics
Sagun Baijal
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
carliotwaycave
 
R workshop
R workshopR workshop
R workshop
Revanth19921
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
tirlukachaitanya
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
alexstorer
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
SOUMIQUE AHAMED
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
Sander Timmer
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
Majid Abdollahi
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
Massimo Schenone
 
e_lumley.pdf
e_lumley.pdfe_lumley.pdf
e_lumley.pdf
betsegaw123
 
R Introduction
R IntroductionR Introduction
R Introduction
Sangeetha S
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
ARUN DN
 

Similar to Hands on data science with r.pptx (20)

Hadoop
HadoopHadoop
Hadoop
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
R basics
R basicsR basics
R basics
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
R workshop
R workshopR workshop
R workshop
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
e_lumley.pdf
e_lumley.pdfe_lumley.pdf
e_lumley.pdf
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 

More from Nimrita Koul

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
Nimrita Koul
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Nimrita Koul
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
Nimrita Koul
 
Structures in C
Structures in CStructures in C
Structures in C
Nimrita Koul
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++
Nimrita Koul
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
Nimrita Koul
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
Nimrita Koul
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
Nimrita Koul
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentation
Nimrita Koul
 

More from Nimrita Koul (10)

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Structures in C
Structures in CStructures in C
Structures in C
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentation
 

Recently uploaded

Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
AlvianRamadhani5
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
um7474492
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
sachin chaurasia
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
AIRCC Publishing Corporation
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
Dwarkadas J Sanghvi College of Engineering
 

Recently uploaded (20)

Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf5G Radio Network Througput Problem Analysis HCIA.pdf
5G Radio Network Througput Problem Analysis HCIA.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...smart pill dispenser is designed to improve medication adherence and safety f...
smart pill dispenser is designed to improve medication adherence and safety f...
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
Mechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineeringMechatronics material . Mechanical engineering
Mechatronics material . Mechanical engineering
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
AI-Based Home Security System : Home security
AI-Based Home Security System : Home securityAI-Based Home Security System : Home security
AI-Based Home Security System : Home security
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Introduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.pptIntroduction to Computer Networks & OSI MODEL.ppt
Introduction to Computer Networks & OSI MODEL.ppt
 

Hands on data science with r.pptx

  • 1. Hands-on Data Science with R Presented by: Nimrita Koul Title: Assistant Professor School: School of Computing & IT Affiliation: REVA University, Bengaluru
  • 2. Agenda Pre Lunch Session: R Fundamentals 1.1 What is R? RStudio? 1.2 Installing R and RStudio 1.3 Data Types and Structures – vector, matrix, data frames, list 1.4 Control Flow 1.5 Data Import and Export HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 3. Agenda… Post Lunch Session: Data Analysis with R 2.1 Working with iris data set ---2.1.1 Exploring individual variables of iris dataset ---2.1.2 Plotting Histogram, scatter plot, table, pie chart ---2.1.3 scatterplot3d ---2.1.4 contour ---2.1.5 persp ---2.1.6Saving charts to files HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 4. Agenda 2.2 Classification 2.2.1 Decision Trees 2.3 Regression - Linear Regression 2.4 Clustering 2.4.1 K means clustering 2.5 Association Rule mining on Titanic Dataset using Apriori Algorithm 2.6 Sentiment Analysis of Twitter Text HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 5. Session 1 HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 6. What is R? R is a free software environment for statistical computing and graphics. R can be easily extended with 9,200+ packages available on CRAN (as of Oct 2016). http://www.r-project.org/ http://cran.r-project.org/ http://www.bioconductor.org/ http://cran.r-project.org/manuals.html A great and complete documentation on R is available at: http://cran.r-project.org/doc/manuals/R-intro.pdf HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 7. Installing R HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 8. Installing R HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 9. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 10. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 11. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 12. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 13. Completing R Installation HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 14. What is RStudio It is an integrated development environment (IDE) for R It runs on various operating systems like Windows, Mac OS X and Linux RStudio project has following folders for default content types: code: source code data: raw data, cleaned data figures: charts and graphs  docs: documents and reports  models: analytics models HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 15. Installing RStudio Follow the steps below for installing R Studio:  1. Go to https://www.rstudio.com/products/rstudio/download/  2. In ‘Installers for Supported Platforms’ section, choose and click the R Studio installer based on your operating system. The download should begin as soon as you click. 3. Click Next..Next..Finish. Download Complete. To Start R Studio, click on its desktop icon HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 16. Installing R Studio HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 17. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 18. Installing R Studio HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 19. RStudio window looks like this HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 20. Data Types & Structures Data types  Integer Numeric Character Factor Logical Data structures Vector Matrix Data frame List HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 21. Vector and vector arithmetic HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 22. Matrix HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 23. Data Frame HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 24. List HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 25. Control Flow…for loop for(i in 1:5) print(1:i) for(i in 1:5) print(2:i) for(i in 1:5) print(3:i) for(i in 1:5) print(4:i) for(i in 1:5) print(5:i) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 26. Control Flow .. If statement x <- 1:15 if (sample(x, 2) <= 10) { print("x is less than 10") } else { print("x is greater than 10") } barplot(table(sample(1:3, size=1000, replace=TRUE, prob=c(.30,.60,.10)))) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 27. Data Import and Export HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 28. Save and Load R Objects HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 29. Import from and Export to .csv Files HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 30. Session 2 : Data Analysis with R HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 31. Working with iris data set We use iris dataset for demonstration of data exploration with R. See The dimension and names of data can be obtained respectively with dim() and names(). Functions str() and attributes() return the structure and attributes of data. Dim(iris) Names(iris) Str(iris) Attributes(iris) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 32. Structure of iris dataset HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 33. Working with iris dataset >iris[1:5] ----to print all 5 columns of iris data set >iris[1:2] --- to print first two columns of iris data set >iris[5] – to print only 5th column of iris data set >head(iris) --- first five records of iris data set >tail(iris) --- last five records of iris data set >iris[1:10, "Sepal.Length"]---- first 10 values in column Sepal.Length > iris$Sepal.Length[1:10]-----same as above Individual variables with summary(iris) >summary(iris) >quantile(iris$Sepal.Length) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 34. Visualization & Graphics Histogram > hist(iris$Sepal.Length) Density Plot > plot(density(iris$Sepal.Length)) Table > table(iris$Species) Pie Chart > pie(table(iris$Species)) Box Plot >boxplot(Sepal.Length~Species,data=iris) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 35. Visualization and Graphics Scatter plot A scatter plot can be drawn for two numeric variables with plot() as below. Using function with(), we don't need to add “iris$" before variable names. In the code below, the colors (col) and symbols (pch) of points are set to Species. > with(iris, plot(Sepal.Length, Sepal.Width, col=Species, pch=as.numeric(Species))) When there are many points, some of them may overlap. We can use jitter() to add a small amount of noise to the data before plotting >plot(jitter(iris$Sepal.Length), jitter(iris$Sepal.Width)) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 36. 3d scatter plot and interactive3d plots Install.packages(“scatterplot3d”) A 3D scatter plot can be produced with package scatterplot3d We first need to install package scatterplot3d and load it using library(scatterplot3d) > library(scatterplot3d) > scatterplot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) For interactive 3d plot >library(rgl) >plot3d(iris$Petal.Width, iris$Sepal.Length, iris$Sepal.Width) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 37. Heatmap and Contour plot A heat map presents a 2D display of a data matrix, which can be generated with heatmap() in R. We calculate the similarity between different lowers in the iris data with dist() and then plot it with a heat map. > distMatrix <- as.matrix(dist(iris[,1:4])) > heatmap(distMatrix) Contour plot needs lattice library >library(lattice) >filled.contour(volcano, color=terrain.colors, asp=1, + plot.axes=contour(volcano, add=T)) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 38. Advanced graphics with ggplot2 Perspective -persp Another way to illustrate a numeric matrix is a 3D surface plot shown as below, which is generated with function persp(). > persp(volcano, theta=25, phi=30, expand=0.5, col="lightblue") Package ggplot2 supports complex graphics, which are very useful for exploring data >library(ggplot2) > qplot(Sepal.Length, Sepal.Width, data=iris, facets=Species ~.) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 39. Saving charts to pdf >pdf("myplot.pdf") >x=1:50 >plot(x,log(x)) >graphics.off() # file myplot.pdf will contain the resulting plot. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 40. Covariance and Correlation > cov(iris$Sepal.Length, iris$Petal.Length) >cov(iris[,1:4]) > cor(iris$Sepal.Length, iris$Petal.Length) > cor(iris[,1:4]) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 41. Classification with Decision Trees We can build a decision tree for the iris data with function ctree() in package “party”. Sepal.Length, Sepal.Width, Petal.Length and Petal.Width are used to predict the Species of flowers. In the package, function ctree() builds a decision tree, and predict() makes prediction for new data. Before modeling, the iris data is split below into two subsets: training (70%) and test (30%). The random seed is set to a fixed value below to make the results reproducible HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 42. Decision Trees > set.seed(1234) > ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.7, 0.3)) > trainData <- iris[ind==1,] > testData <- iris[ind==2,] > library(party) #party package is required to create decision trees > myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width > iris_ctree <- ctree(myFormula, data=trainData) > # check the prediction > table(predict(iris_ctree), trainData$Species) > print(iris_ctree) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 43. Linear Regression Regression is to build a function of independent variables (also known as predictors) to predict a dependent variable (also called response). For example, banks assess the risk of home-loan applicants based on their age, income, expenses, occupation, number of dependents, total credit limit, etc. y = c0 + c1x1 + c2x2 + + ckxk; where x1; x2; ; xk are predictors and y is the response to predict. We demonstrate linear regression using function lm() on Australian Consumer Price Index(cpi) data , quarterly from 2008 to 2010. Data is created and plotted, x axis is manually added with function axis(), las=3 makes text vertical. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 44. Linear Regression >year=rep(2008:2010,each=4) > quarter=rep(1:4,3) > cpi=c(162.2,164.6,166.5,166.0,166.2,167.0,168.6,169.5,171.0,172.1,17 3.3,174.0) > plot(cpi,xaxt="n",ylab="CPI",xlab="") #draw x axis > axis(1,labels=paste(year,quarter,sep="Q"),at=1:12,las=3) > cor(year,cpi) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 45. Checking correlation and plotting 3d > cor(quarter,cpi) > fit<-lm(cpi~year+quarter) > fit >plot(fit) >attributes(fit) >library(scatterplot3d) >s3d<- scatterplot3d(year,quarter,cpi,highlight.3d=T,type=“h”,lab=c(2,3)) >s3d$plane3d(fit) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 46. Linear Regression With the above linear model, CPI is calculated as cpi = c0 + c1 year + c2 quarter; where c0, c1 and c2 are coefficients from model fit. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 47. Clustering- K - Means We remove species from the data to cluster, thenapply function kmeans() to iris2, and store the clustering result in kmeans.result. The cluster number is set to 3 in the code below. >iris2=iris >iris2$Species=NULL >(kmeans.result=kmeans(iris2,3)) >table(iris$Species,kmeans.result$cluster) The above result shows that cluster setosa" can be easily separated from the other clusters, and that clusters versicolor" and virginica" are to a small degree overlapped with each other. HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 48. Plotting clusters > plot(iris2[c("Sepal.Length", "Sepal.Width")], col = kmeans.result$cluster) # plot cluster centers > points(kmeans.result$centers[,c("Sepal.Length", "Sepal.Width")], col = 1:3, pch = 8, cex=2) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 49. Association Rule Mining on Titanic dataset Attributes - Passenger class, Gender, Child, Survived > str(Titanic) table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes" > df <- as.data.frame(Titanic) > head(df) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 50. Titanic Dataset Class Sex Age Survived Freq 1 1st Male Child No 0 2 2nd Male Child No 0 3 3rd Male Child No 35 4 Crew Male Child No 0 5 1st Female Child No 0 6 2nd Female Child No 0 HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 51. Titanic Raw > titanic.raw <- NULL > for(i in 1:4) { + titanic.raw <- cbind(titanic.raw, rep(as.character(df[,i]), df$Freq)) + } > titanic.raw <- as.data.frame(titanic.raw) > names(titanic.raw) <- names(df)[1:4] > str(titanic.raw) > dim(titanic.raw) > str(titanic.raw) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 52. Titanic Raw > head(titanic.raw) > summary(titanic.raw) > library(arules) > # find association rules with default settings > rules.all <- apriori(titanic.raw) > rules.all >inspect(rules.all) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 53. Apriori Algorithm >rules <- apriori(titanic.raw, control = list(verbose=F), + parameter = list(minlen=2, supp=0.005, conf=0.8), + appearance = list(rhs=c("Survived=No", "Survived=Yes"), + default="lhs")) > quality(rules) <- round(quality(rules), digits=3) > rules.sorted <- sort(rules, by="lift") > inspect(rules.sorted) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 54. Removing redundant rules > subset.matrix <- is.subset(rules.sorted, rules.sorted) > subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA > redundant <- colSums(subset.matrix, na.rm=T) >= 1 > which(redundant) > # remove redundant rules > rules.pruned <- rules.sorted[!redundant] > inspect(rules.pruned) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.
  • 55. Visualizing Association Rules > library(arulesViz) > plot(rules.all) > plot(rules.all, method="grouped") > plot(rules.all, method="graph") > plot(rules.all, method="graph", control=list(type="items")) HAVE TOTAL FOCUS ON YOUR GOAL, IT WILL DESTROY MANY OBSTACLES OF YOUR PATH.