This document provides an introduction to text mining. It discusses what text mining is, its objectives, and the typical flow of steps. The objectives of text mining include quantitative text analysis, classification, clustering, summarization, and entity and sentiment analysis. Technologies used include programming languages and tools for demonstrations involving preprocessing text, creating word clouds, quantitative text analysis, and n-gram analysis. The document concludes with recommendations for courses and books to start learning more about text mining.
2. Agenda
1. What is Text Mining?
○ Objectives
○ Flow of steps
2. Technologies
3. Demonstration 1
4. Techniques
○ Word Clouds
○ Quantitative analysis of the text
○ N-Gram
5. Demonstration 2
TextMining:Anintroduction 2
|
3. What is Text Mining?
● Text mining is the process of discovering knowledge from textual
(unstructured) content.
● It is a subfield of Data Mining and can use Natural Language
Processing techniques.
Detail: According to the source Ah-hwee [1] Tan that “80% of a
company's information is contained in text documents”
Text Mining : An introduction
3
|
TextMining:Anintroduction
[1] Text Mining: the state of the art and the challenges, Ah-hwee Tan – 2000.
4. Objectives
Main tasks that text mining can do:
● Quantitative analysis of the text;
● Classification;
● Clustering;
● Summarization of Texts;
● Recognition of entities;
● Analysis of feeling;
● others.
Text Mining : An introduction
4
|
TextMining:Anintroduction
5. Flow of steps
Text Mining : An introduction
5
|
TextMining:Anintroduction
Pattern extraction Assessment of
knowledge
Pre-processing of
documents
Continuous flow
8. Demonstration 1 : pre-processing
Text Mining : An introduction
8
|
TextMining:Anintroduction
9. Techniques : Word clouds
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
9
|
TextMining:Anintroduction
10. Techniques : Quantitative analysis of the text
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
10
|
TextMining:Anintroduction
11. Techniques : N-gram
It is join sequence of n items from a text.
Text Mining : An introduction
11
|
TextMining:Anintroduction
12. Demonstration 2
I'm going to apply these basic Text Mining techniques to tweets from the
four biggest football teams at Portugal.
Text Mining : An introduction
12
|
TextMining:Anintroduction
13. I like this!
Text Mining : An introduction
13
|
TextMining:Anintroduction
14. Starting the study - Courses
● Coursera:
○ Text Mining and Analytics;
○ Machine Learning, by Andrew Ng;
○ Data Science.
● EDX:
○ Data, Analytics and Learning.
Text Mining : An introduction
14
|
TextMining:Anintroduction
15. Starting the study - Books
Text Mining : An introduction
15
|
TextMining:Anintroduction
https://www.tidytextmining.com/