Text Mining in R
Manjeet Singh,
Consultant – ASG Group
Please silence
cell phones
Manjeet Singh
Information & Data Management Consultant
ASG Group
Information and Data Management Consultant
at ASG Group with more than 15 years'
experience in Information & Data Management
domain.
Experience of working in industries such as
Energy, Oil and Gas, Banking, Local Government
and IT services.
An aspiring Data Scientist and passionate about
Artificial Intelligence (AI) and Machine Learning
(ML).
Place your
photo here
/manjeetsingh-aibipro @manjeetsinghitp
Thanks to all sponsors
Agenda
Text Analytics / Text Mining
Text Analytics, also known as text mining, is the process of examining
large collections of written resources to generate new information, and to
transform the unstructured text into structured data for use in further
analysis.
Text Mining
Extract Documents
1. Text sources such as websites, pdf, DBs
2. Move document text into corpus.
Corpus: Structured set of text annotated with additional
metadata and details.
Transformation
1. Convert to lower case
2. Remove punctuation
3. Remove stop words (e.g. ‘a’, ‘an’, ‘be’ etc)
Extract Features
1. Convert text string into quantifiable measures
2. Count frequency of each term and form a
vector
Perform Analysis (some approaches)
1. Word frequency
2. Document classification
3. Locating specific set of words
etc.
Text Mining
Sentiment Analysis is the process of determining
whether a piece of writing is positive, negative or
neutral. It’s also known as opinion mining,
deriving the opinion or attitude of a speaker
https://www.lexalytics.com/technology/sentiment
Sentiment Analysis
Text Mining in R
Text mining in R requires following:
Packages
Dataset package
janeaustenr
This package contains the complete text of Jane Austen's 6 completed, published novels,
formatted to be convenient for text analysis
tidyr
Designed for data
tidying
tidytext
It makes text mining
tasks easier, more
effective. Contains
Lexicons
dplyr
a flexible
grammar of
data
manipulation
stringr
Simple, Consistent
Wrappers for
Common String
Operations
ggplot2
Create graphs
Lexicon: The vocabulary of a person, language, or branch of knowledge
R in SQL Server
Running ‘R’ code in T-SQL requires either of the following :
SQL Server 2017 Machine
Learning Services, with the R
language installed
SQL Server 2016 R Services Azure Data Science VM
DEMO
• Perform Text mining using ‘R’ packages ?
• Run ‘R’ script from with in T-SQL ?
Thank you!
Manjeet Singh
Information and Data Management Consultant,
ASG Group
/manjeetsingh-aibipro @manjeetsinghitp

Text Mining in R

  • 1.
    Text Mining inR Manjeet Singh, Consultant – ASG Group
  • 2.
  • 3.
    Manjeet Singh Information &Data Management Consultant ASG Group Information and Data Management Consultant at ASG Group with more than 15 years' experience in Information & Data Management domain. Experience of working in industries such as Energy, Oil and Gas, Banking, Local Government and IT services. An aspiring Data Scientist and passionate about Artificial Intelligence (AI) and Machine Learning (ML). Place your photo here /manjeetsingh-aibipro @manjeetsinghitp
  • 4.
    Thanks to allsponsors
  • 5.
  • 6.
    Text Analytics /Text Mining Text Analytics, also known as text mining, is the process of examining large collections of written resources to generate new information, and to transform the unstructured text into structured data for use in further analysis.
  • 7.
    Text Mining Extract Documents 1.Text sources such as websites, pdf, DBs 2. Move document text into corpus. Corpus: Structured set of text annotated with additional metadata and details. Transformation 1. Convert to lower case 2. Remove punctuation 3. Remove stop words (e.g. ‘a’, ‘an’, ‘be’ etc) Extract Features 1. Convert text string into quantifiable measures 2. Count frequency of each term and form a vector Perform Analysis (some approaches) 1. Word frequency 2. Document classification 3. Locating specific set of words etc.
  • 8.
    Text Mining Sentiment Analysisis the process of determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker https://www.lexalytics.com/technology/sentiment Sentiment Analysis
  • 9.
    Text Mining inR Text mining in R requires following: Packages Dataset package janeaustenr This package contains the complete text of Jane Austen's 6 completed, published novels, formatted to be convenient for text analysis tidyr Designed for data tidying tidytext It makes text mining tasks easier, more effective. Contains Lexicons dplyr a flexible grammar of data manipulation stringr Simple, Consistent Wrappers for Common String Operations ggplot2 Create graphs Lexicon: The vocabulary of a person, language, or branch of knowledge
  • 10.
    R in SQLServer Running ‘R’ code in T-SQL requires either of the following : SQL Server 2017 Machine Learning Services, with the R language installed SQL Server 2016 R Services Azure Data Science VM
  • 11.
    DEMO • Perform Textmining using ‘R’ packages ? • Run ‘R’ script from with in T-SQL ?
  • 12.
    Thank you! Manjeet Singh Informationand Data Management Consultant, ASG Group /manjeetsingh-aibipro @manjeetsinghitp