SlideShare a Scribd company logo
1 of 23
R Programming
What is R?
 R is world’s most widely used statistics programming language .
R is a programming language and software environment for
 Statistical analysis.
 Graphics representation and reporting .
R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
History
 R is a programming language it was an
implementation over S language. R was first
designed by Ross Ihaka and Robert Gentleman
at the University of Auckland in 1993
 It was stable released on October 31st 2014 the
four months ago, by R Development Core
Team Under GNU General Public License
Introduction
 R is a programming language and software environment for statistical computing
and graphics
 The R language is widely used among statisticians software and data analysis
 It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.
 R can be downloaded and installed from CRAN website, CRAN stands for
Comprehensive R Archive Network
R - Data Types
Primitive (or atomic) data types in R are:
• Numeric (integer, double, complex)
• Character
• Logical
• Function
Text Mining with R
 R is an open source language and environment for statistical computing and
graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which
are used to carry out the earlier-mentioned steps in text processing. The first
prerequisite is that Rand R Studio need to be installed on your machine. R is an
open source language and environment for statistical computing and graphics. It
includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to
carry out the earlier-mentioned steps in text processing. The first prerequisite is
that Rand R Studio need to be installed on your machine.
Packages Used in Text Mining
 RSQLite, ‘SQLite’ Interface for R
 tm, framework for text mining applications
 SnowballC, text stemming library
 Wordloud, for making wordCloud visualizations
 Syuzhet, text sentiment analysis
Reading SQLite data in R
 Docs <- Corpus(docs,VectorSource(docs$comments))
# Get all the emails sent by Hillary
 Comm <- read.csv(“comments.csv”, header = TRUE)
 emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
Cleaning Text in R
 Install.packages(“tm”)
 Install.packages(“NLP”)
 Load text mining package - library(“tm”)
 docs <- Corpus(VerctorSum(emailRaw)) – Corpus it is a collection of text
documents
Processing text in R
 docs <- tm_map(docs, content_transformer(tolower)) – It makes all the words to
lower cases.
 docs <- tm_map(docs, removeNumbers) - It removes numbers
 docs <- tm_map(docs, removeWords, stopWords(“english”)) – It removes stop
words like the, is, of
 docs <- tm_map(docs, removePunctuation) – It removes Punctuation
 docs <- tm_map(docs, stripWhiteSpace) – It removes extra White Spaces
SnowballC to Stem Text
 #Text stemming (reduces words to their root form)
 library("SnowballC")
 docs <- tm_map(docs, stemDocument)
 # Remove additional stopwords
 docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
SnowballC to Stem Text
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)
Some picture
Visualizations
 #Wordcloud
 Uses two libraries libraries – wordcloud and
RcolorBrewer
 #Sentiment Analysis
 Uses library - syuzhet
k

More Related Content

What's hot

Data Analysis and Visualization using Python
Data Analysis and Visualization using PythonData Analysis and Visualization using Python
Data Analysis and Visualization using PythonChariza Pladin
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxvishal choudhary
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...Edureka!
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesAutomated Insights
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 

What's hot (20)

Data Analysis and Visualization using Python
Data Analysis and Visualization using PythonData Analysis and Visualization using Python
Data Analysis and Visualization using Python
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptxEX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Natural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization OpportunitiesNatural Language Generation: New Automation and Personalization Opportunities
Natural Language Generation: New Automation and Personalization Opportunities
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Data Science
Data ScienceData Science
Data Science
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 

Similar to R Programming Guide: Statistical Analysis & Graphics with R

R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul SinghRavi Basil
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxrajalakshmi5921
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptxranapoonam1
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programminghemasri56
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
Basics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxBasics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxMSANDHYARANI3
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSOUMIQUE AHAMED
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTHaritikaChhatwal1
 
Introduction To R
Introduction To RIntroduction To R
Introduction To Rr content
 

Similar to R Programming Guide: Statistical Analysis & Graphics with R (20)

R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
R programming language
R programming languageR programming language
R programming language
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
R basics for MBA Students[1].pptx
R basics for MBA Students[1].pptxR basics for MBA Students[1].pptx
R basics for MBA Students[1].pptx
 
1_Introduction.pptx
1_Introduction.pptx1_Introduction.pptx
1_Introduction.pptx
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
R language
R languageR language
R language
 
R programming
R programmingR programming
R programming
 
Basics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptxBasics-of-R-programming.9625714.powerpoint.pptx
Basics-of-R-programming.9625714.powerpoint.pptx
 
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdfSTAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
STAT-522 (Data Analysis Using R) by SOUMIQUE AHAMED.pdf
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 

Recently uploaded

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

R Programming Guide: Statistical Analysis & Graphics with R

  • 2. What is R?  R is world’s most widely used statistics programming language . R is a programming language and software environment for  Statistical analysis.  Graphics representation and reporting . R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
  • 3. History  R is a programming language it was an implementation over S language. R was first designed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1993  It was stable released on October 31st 2014 the four months ago, by R Development Core Team Under GNU General Public License
  • 4. Introduction  R is a programming language and software environment for statistical computing and graphics  The R language is widely used among statisticians software and data analysis  It compiles and runs on a wide variety of UNIX platforms, Windows and Mac OS.  R can be downloaded and installed from CRAN website, CRAN stands for Comprehensive R Archive Network
  • 5. R - Data Types Primitive (or atomic) data types in R are: • Numeric (integer, double, complex) • Character • Logical • Function
  • 6. Text Mining with R  R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine. R is an open source language and environment for statistical computing and graphics. It includes packages like tm, SnowballC, ggplot2 and wordcloud, which are used to carry out the earlier-mentioned steps in text processing. The first prerequisite is that Rand R Studio need to be installed on your machine.
  • 7. Packages Used in Text Mining  RSQLite, ‘SQLite’ Interface for R  tm, framework for text mining applications  SnowballC, text stemming library  Wordloud, for making wordCloud visualizations  Syuzhet, text sentiment analysis
  • 8.
  • 9. Reading SQLite data in R  Docs <- Corpus(docs,VectorSource(docs$comments)) # Get all the emails sent by Hillary  Comm <- read.csv(“comments.csv”, header = TRUE)  emailRaw <- paste(emailHillary$EmailBody, collapse=" // ")
  • 10. Cleaning Text in R  Install.packages(“tm”)  Install.packages(“NLP”)  Load text mining package - library(“tm”)  docs <- Corpus(VerctorSum(emailRaw)) – Corpus it is a collection of text documents
  • 11. Processing text in R  docs <- tm_map(docs, content_transformer(tolower)) – It makes all the words to lower cases.  docs <- tm_map(docs, removeNumbers) - It removes numbers  docs <- tm_map(docs, removeWords, stopWords(“english”)) – It removes stop words like the, is, of  docs <- tm_map(docs, removePunctuation) – It removes Punctuation  docs <- tm_map(docs, stripWhiteSpace) – It removes extra White Spaces
  • 12. SnowballC to Stem Text  #Text stemming (reduces words to their root form)  library("SnowballC")  docs <- tm_map(docs, stemDocument)  # Remove additional stopwords  docs <- tm_map(docs, removeWords, c("clintonemailcom", "stategov", "hrod"))
  • 13. SnowballC to Stem Text  dtm <- TermDocumentMatrix(docs)  m <- as.matrix(dtm)  v <- sort(rowSums(m),decreasing=TRUE)  d <- data.frame(word = names(v),freq=v)  head(d, 10)
  • 14. Some picture Visualizations  #Wordcloud  Uses two libraries libraries – wordcloud and RcolorBrewer  #Sentiment Analysis  Uses library - syuzhet
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. k

Editor's Notes

  1. Old programming No multithreading Data loaded directly into memory limits fuctionlaity for larger datasets Sandbox…subsample data Microsoft working on multicore r h2o