R Programming Guide: Statistical Analysis & Graphics with R

What is R?
 R is world’s most widely used statistics programming language .
R is a programming language and software environment for
 Statistical analysis.
 Graphics representation and reporting .
R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.

History
 R is a programming language it was an
implementation over S language. R was first
designed by Ross Ihaka and Robert Gentleman
at the University of Auckland in 1993
 It was stable released on October 31st 2014 the
four months ago, by R Development Core
Team Under GNU General Public License

Introduction
 R is a programming language and software environment for statistical computing
and graphics
 The R language is widely used among statisticians software and data analysis
 It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.
 R can be downloaded and installed from CRAN website, CRAN stands for
Comprehensive R Archive Network

R - Data Types
Primitive (or atomic) data types in R are:
• Numeric (integer, double, complex)
• Character
• Logical
• Function

Text Mining with R
 R is an open source language and environment for statistical computing and
graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which
are used to carry out the earlier-mentioned steps in text processing. The first
prerequisite is that Rand R Studio need to be installed on your machine. R is an
open source language and environment for statistical computing and graphics. It
includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to
carry out the earlier-mentioned steps in text processing. The first prerequisite is
that Rand R Studio need to be installed on your machine.

Packages Used in Text Mining
 RSQLite, ‘SQLite’ Interface for R
 tm, framework for text mining applications
 SnowballC, text stemming library
 Wordloud, for making wordCloud visualizations
 Syuzhet, text sentiment analysis

Reading SQLite data in R
 Docs <- Corpus(docs,VectorSource(docs$comments))
# Get all the emails sent by Hillary
 Comm <- read.csv(“comments.csv”, header = TRUE)
 emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")

Cleaning Text in R
 Install.packages(“tm”)
 Install.packages(“NLP”)
 Load text mining package - library(“tm”)
 docs <- Corpus(VerctorSum(emailRaw)) – Corpus it is a collection of text
documents

Processing text in R
 docs <- tm_map(docs, content_transformer(tolower)) – It makes all the words to
lower cases.
 docs <- tm_map(docs, removeNumbers) - It removes numbers
 docs <- tm_map(docs, removeWords, stopWords(“english”)) – It removes stop
words like the, is, of
 docs <- tm_map(docs, removePunctuation) – It removes Punctuation
 docs <- tm_map(docs, stripWhiteSpace) – It removes extra White Spaces

SnowballC to Stem Text
 #Text stemming (reduces words to their root form)
 library("SnowballC")
 docs <- tm_map(docs, stemDocument)
 # Remove additional stopwords
 docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))

SnowballC to Stem Text
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)

Some picture
Visualizations
 #Wordcloud
 Uses two libraries libraries – wordcloud and
RcolorBrewer
 #Sentiment Analysis
 Uses library - syuzhet

R Programming Guide: Statistical Analysis & Graphics with R

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to R Programming Guide: Statistical Analysis & Graphics with R

Similar to R Programming Guide: Statistical Analysis & Graphics with R (20)

Recently uploaded

Recently uploaded (20)

R Programming Guide: Statistical Analysis & Graphics with R

Editor's Notes