Text Mining:
An introduction
Charles Mendes de Macedo
INSPIRATION PLATFORM TEAM
|
APRIL 2019
Senior Software Engineer
MCSD, MCSA, MCTS
Agenda
1. What is Text Mining?
○ Objectives
○ Flow of steps
2. Technologies
3. Demonstration 1
4. Techniques
○ Word Clouds
○ Quantitative analysis of the text
○ N-Gram
5. Demonstration 2
TextMining:Anintroduction 2
|
What is Text Mining?
● Text mining is the process of discovering knowledge from textual
(unstructured) content.
● It is a subfield of Data Mining and can use Natural Language
Processing techniques.
Detail: According to the source Ah-hwee [1] Tan that “80% of a
company's information is contained in text documents”
Text Mining : An introduction
3
|
TextMining:Anintroduction
[1] Text Mining: the state of the art and the challenges, Ah-hwee Tan – 2000.
Objectives
Main tasks that text mining can do:
● Quantitative analysis of the text;
● Classification;
● Clustering;
● Summarization of Texts;
● Recognition of entities;
● Analysis of feeling;
● others.
Text Mining : An introduction
4
|
TextMining:Anintroduction
Flow of steps
Text Mining : An introduction
5
|
TextMining:Anintroduction
Pattern extraction Assessment of
knowledge
Pre-processing of
documents
Continuous flow
Technologies
Languages used to do text mining
Text Mining : An introduction
6
|
TextMining:Anintroduction
Technologies
Technologies used in the demonstrations
Text Mining : An introduction
7
|
TextMining:Anintroduction
Demonstration 1 : pre-processing
Text Mining : An introduction
8
|
TextMining:Anintroduction
Techniques : Word clouds
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
9
|
TextMining:Anintroduction
Techniques : Quantitative analysis of the text
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
10
|
TextMining:Anintroduction
Techniques : N-gram
It is join sequence of n items from a text.
Text Mining : An introduction
11
|
TextMining:Anintroduction
Demonstration 2
I'm going to apply these basic Text Mining techniques to tweets from the
four biggest football teams at Portugal.
Text Mining : An introduction
12
|
TextMining:Anintroduction
I like this!
Text Mining : An introduction
13
|
TextMining:Anintroduction
Starting the study - Courses
● Coursera:
○ Text Mining and Analytics;
○ Machine Learning, by Andrew Ng;
○ Data Science.
● EDX:
○ Data, Analytics and Learning.
Text Mining : An introduction
14
|
TextMining:Anintroduction
Starting the study - Books
Text Mining : An introduction
15
|
TextMining:Anintroduction
https://www.tidytextmining.com/
Thank you
TextMining:Anintroduction

Introduction to text mining

  • 1.
    Text Mining: An introduction CharlesMendes de Macedo INSPIRATION PLATFORM TEAM | APRIL 2019 Senior Software Engineer MCSD, MCSA, MCTS
  • 2.
    Agenda 1. What isText Mining? ○ Objectives ○ Flow of steps 2. Technologies 3. Demonstration 1 4. Techniques ○ Word Clouds ○ Quantitative analysis of the text ○ N-Gram 5. Demonstration 2 TextMining:Anintroduction 2 |
  • 3.
    What is TextMining? ● Text mining is the process of discovering knowledge from textual (unstructured) content. ● It is a subfield of Data Mining and can use Natural Language Processing techniques. Detail: According to the source Ah-hwee [1] Tan that “80% of a company's information is contained in text documents” Text Mining : An introduction 3 | TextMining:Anintroduction [1] Text Mining: the state of the art and the challenges, Ah-hwee Tan – 2000.
  • 4.
    Objectives Main tasks thattext mining can do: ● Quantitative analysis of the text; ● Classification; ● Clustering; ● Summarization of Texts; ● Recognition of entities; ● Analysis of feeling; ● others. Text Mining : An introduction 4 | TextMining:Anintroduction
  • 5.
    Flow of steps TextMining : An introduction 5 | TextMining:Anintroduction Pattern extraction Assessment of knowledge Pre-processing of documents Continuous flow
  • 6.
    Technologies Languages used todo text mining Text Mining : An introduction 6 | TextMining:Anintroduction
  • 7.
    Technologies Technologies used inthe demonstrations Text Mining : An introduction 7 | TextMining:Anintroduction
  • 8.
    Demonstration 1 :pre-processing Text Mining : An introduction 8 | TextMining:Anintroduction
  • 9.
    Techniques : Wordclouds It is a graphical representation of the frequency of words, highlighting the most frequent terms. Text Mining : An introduction 9 | TextMining:Anintroduction
  • 10.
    Techniques : Quantitativeanalysis of the text It is a graphical representation of the frequency of words, highlighting the most frequent terms. Text Mining : An introduction 10 | TextMining:Anintroduction
  • 11.
    Techniques : N-gram Itis join sequence of n items from a text. Text Mining : An introduction 11 | TextMining:Anintroduction
  • 12.
    Demonstration 2 I'm goingto apply these basic Text Mining techniques to tweets from the four biggest football teams at Portugal. Text Mining : An introduction 12 | TextMining:Anintroduction
  • 13.
    I like this! TextMining : An introduction 13 | TextMining:Anintroduction
  • 14.
    Starting the study- Courses ● Coursera: ○ Text Mining and Analytics; ○ Machine Learning, by Andrew Ng; ○ Data Science. ● EDX: ○ Data, Analytics and Learning. Text Mining : An introduction 14 | TextMining:Anintroduction
  • 15.
    Starting the study- Books Text Mining : An introduction 15 | TextMining:Anintroduction https://www.tidytextmining.com/
  • 16.