SlideShare a Scribd company logo
1 of 16
Download to read offline
Text Mining:
An introduction
Charles Mendes de Macedo
INSPIRATION PLATFORM TEAM
|
APRIL 2019
Senior Software Engineer
MCSD, MCSA, MCTS
Agenda
1. What is Text Mining?
○ Objectives
○ Flow of steps
2. Technologies
3. Demonstration 1
4. Techniques
○ Word Clouds
○ Quantitative analysis of the text
○ N-Gram
5. Demonstration 2
TextMining:Anintroduction 2
|
What is Text Mining?
● Text mining is the process of discovering knowledge from textual
(unstructured) content.
● It is a subfield of Data Mining and can use Natural Language
Processing techniques.
Detail: According to the source Ah-hwee [1] Tan that “80% of a
company's information is contained in text documents”
Text Mining : An introduction
3
|
TextMining:Anintroduction
[1] Text Mining: the state of the art and the challenges, Ah-hwee Tan – 2000.
Objectives
Main tasks that text mining can do:
● Quantitative analysis of the text;
● Classification;
● Clustering;
● Summarization of Texts;
● Recognition of entities;
● Analysis of feeling;
● others.
Text Mining : An introduction
4
|
TextMining:Anintroduction
Flow of steps
Text Mining : An introduction
5
|
TextMining:Anintroduction
Pattern extraction Assessment of
knowledge
Pre-processing of
documents
Continuous flow
Technologies
Languages used to do text mining
Text Mining : An introduction
6
|
TextMining:Anintroduction
Technologies
Technologies used in the demonstrations
Text Mining : An introduction
7
|
TextMining:Anintroduction
Demonstration 1 : pre-processing
Text Mining : An introduction
8
|
TextMining:Anintroduction
Techniques : Word clouds
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
9
|
TextMining:Anintroduction
Techniques : Quantitative analysis of the text
It is a graphical representation of the frequency of words, highlighting the
most frequent terms.
Text Mining : An introduction
10
|
TextMining:Anintroduction
Techniques : N-gram
It is join sequence of n items from a text.
Text Mining : An introduction
11
|
TextMining:Anintroduction
Demonstration 2
I'm going to apply these basic Text Mining techniques to tweets from the
four biggest football teams at Portugal.
Text Mining : An introduction
12
|
TextMining:Anintroduction
I like this!
Text Mining : An introduction
13
|
TextMining:Anintroduction
Starting the study - Courses
● Coursera:
○ Text Mining and Analytics;
○ Machine Learning, by Andrew Ng;
○ Data Science.
● EDX:
○ Data, Analytics and Learning.
Text Mining : An introduction
14
|
TextMining:Anintroduction
Starting the study - Books
Text Mining : An introduction
15
|
TextMining:Anintroduction
https://www.tidytextmining.com/
Thank you
TextMining:Anintroduction

More Related Content

Similar to Introduction to text mining

CS101-Topic 2.ppt
CS101-Topic 2.pptCS101-Topic 2.ppt
CS101-Topic 2.ppt
dpgdpg
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
Per Runeson
 

Similar to Introduction to text mining (20)

ppt
pptppt
ppt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
 
ResearchPaper
ResearchPaperResearchPaper
ResearchPaper
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
CS101-Topic 2.ppt
CS101-Topic 2.pptCS101-Topic 2.ppt
CS101-Topic 2.ppt
 
CS101-Topic 2.ppt
CS101-Topic 2.pptCS101-Topic 2.ppt
CS101-Topic 2.ppt
 
Text Mining
Text MiningText Mining
Text Mining
 
Nlp project
Nlp projectNlp project
Nlp project
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basics
 
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
Lights Out, Translation is Datafied, by Jaap van der Meer (TAUS)
 
Lecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive DatasetsLecture 7: Learning from Massive Datasets
Lecture 7: Learning from Massive Datasets
 
subrat
 subrat subrat
subrat
 
A Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and applicationA Survey on Text Mining-techniques and application
A Survey on Text Mining-techniques and application
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Recently uploaded (20)

Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 

Introduction to text mining

  • 1. Text Mining: An introduction Charles Mendes de Macedo INSPIRATION PLATFORM TEAM | APRIL 2019 Senior Software Engineer MCSD, MCSA, MCTS
  • 2. Agenda 1. What is Text Mining? ○ Objectives ○ Flow of steps 2. Technologies 3. Demonstration 1 4. Techniques ○ Word Clouds ○ Quantitative analysis of the text ○ N-Gram 5. Demonstration 2 TextMining:Anintroduction 2 |
  • 3. What is Text Mining? ● Text mining is the process of discovering knowledge from textual (unstructured) content. ● It is a subfield of Data Mining and can use Natural Language Processing techniques. Detail: According to the source Ah-hwee [1] Tan that “80% of a company's information is contained in text documents” Text Mining : An introduction 3 | TextMining:Anintroduction [1] Text Mining: the state of the art and the challenges, Ah-hwee Tan – 2000.
  • 4. Objectives Main tasks that text mining can do: ● Quantitative analysis of the text; ● Classification; ● Clustering; ● Summarization of Texts; ● Recognition of entities; ● Analysis of feeling; ● others. Text Mining : An introduction 4 | TextMining:Anintroduction
  • 5. Flow of steps Text Mining : An introduction 5 | TextMining:Anintroduction Pattern extraction Assessment of knowledge Pre-processing of documents Continuous flow
  • 6. Technologies Languages used to do text mining Text Mining : An introduction 6 | TextMining:Anintroduction
  • 7. Technologies Technologies used in the demonstrations Text Mining : An introduction 7 | TextMining:Anintroduction
  • 8. Demonstration 1 : pre-processing Text Mining : An introduction 8 | TextMining:Anintroduction
  • 9. Techniques : Word clouds It is a graphical representation of the frequency of words, highlighting the most frequent terms. Text Mining : An introduction 9 | TextMining:Anintroduction
  • 10. Techniques : Quantitative analysis of the text It is a graphical representation of the frequency of words, highlighting the most frequent terms. Text Mining : An introduction 10 | TextMining:Anintroduction
  • 11. Techniques : N-gram It is join sequence of n items from a text. Text Mining : An introduction 11 | TextMining:Anintroduction
  • 12. Demonstration 2 I'm going to apply these basic Text Mining techniques to tweets from the four biggest football teams at Portugal. Text Mining : An introduction 12 | TextMining:Anintroduction
  • 13. I like this! Text Mining : An introduction 13 | TextMining:Anintroduction
  • 14. Starting the study - Courses ● Coursera: ○ Text Mining and Analytics; ○ Machine Learning, by Andrew Ng; ○ Data Science. ● EDX: ○ Data, Analytics and Learning. Text Mining : An introduction 14 | TextMining:Anintroduction
  • 15. Starting the study - Books Text Mining : An introduction 15 | TextMining:Anintroduction https://www.tidytextmining.com/