SlideShare a Scribd company logo
1 of 17
Download to read offline
Introducing Journal:
A Survey on Text Mining-techniques and
application
Ryota Eisaki
January 14, 2021
Author: Prof.S M.Inazalkar, Jai sharma
Publisher: International Journal of Research In Science Engineering
1
Abstract
In this paper, we combine data mining methods for analyzing
quantitative and qualitative data from financial reports, in order to
see if the textual part of the report contains some indication about
future financial performance.
2
1. Introduction
Text is the most common vehicle for the formal exchange of
information.
The tool is a text mining system which has the capability to
analyze large quantities of natural language text and detects lexical
and linguistic usage patterns in an attempt to extract meaningful
and useful information.
3
2. Related Work
This paper presents an inventive and valuable pattern discovery
technique which includes the processes of pattern deploying and
pattern evolving, to advance the effectiveness of using and
updating discovered patterns for finding appropriate and interesting
information.
4
3. Text Mining Process
Document Gathering
the text documents are collected in different formats
Document Pre-Processing
(a) Tokenization
the given document string devided into one unit
or token
(b) Removal of Stop Word
the removal of useful words like, a, but, of, etc.
(c) Stemming
consider inflectional and derivational stemming
Text Transformation
Vector Space Model and Bag of words
5
3. Text Mining Process
Feature Selection (attribute selection)
This method results in giving low database space,
minimal search technique by taking out irrelevant
feature from input document.
Data mining/Pattern Selection
Evaluate
measures the outcome
6
4. Text Mining Techniques
1. Categorization
the assignment of natural language documents to
predefined categories according to their content.
finding main theme of document by adding metadata
and analyzing document.
7
4. Text Mining Techniques
1.1. Decision Trees
rebuild the manual categorization of training
documents
The main advantage of decision tree is the fact that
the output tree is easy to interpret even for persons
who are not familiar with the details of the model.
the risk is known as "over fitting" but there are ways
to solve this.
8
4. Text Mining Techniques
2. Clustering
divide the similar text into the same cluster.
it differs from classification in than documents are
clusters on the fly instead of through the use of
pre-defined topics.
9
4. Text Mining Techniques
2.1.Hierarchical Clustering
It works at different level of granularity.
1. Bottom up hierarchical clustering method:
a Consider each document as a single cluster
b Calculate similarity of cluster then marge the
two clusters having maximum similarity
c Repeat b till single cluster is formed.
2. Top down hierarchical clustering method:
starts from a single cluster as whole, then it get
split iteratively into various cluster on the basis
of smallest similarity measure.
The advantage of Hierarchical Clustering is that it can use any form
of similarity measure and the disadvantage is that once the clusters
are formed, cannot be rebuild.
10
4. Text Mining Techniques
3. Information Extraction
the documents are first converted into the structured
databases on which data mining techniques can be
applied to extract knowledge or interesting patterns.
11
4. Text Mining Techniques
4. Information Visualization
Visual text mining, or information visualization, puts
large textual sources in a visual hierarchy or map and
provides browsing capabilities, in addition to simple
searching.
12
4. Text Mining Techniques
5. Natural Language Processing(NLP)
NLP is a technology that concerns with natural
language generation(NLG) and natural language
understanding(NLU).
13
4. Text Mining Techniques
6. Topic Tracking
A topic tracking system mechanism by custody of
user profiles and, based on the documents the user
views, guess other documents of interest to the user.
14
5. Applications
• Business Intelligence
• Bioinformatics
• Security Application
the analysis and monitoring of documents
• Human Resource Management
for the purpose of recruiting the candidate by reading and
writing their CVs
• Web Search Enhancement
15
5. Applications
others
• Publishing and media
• Telecommunications, energy and other services industries
• Information technology sector and Internet
• Banks, insurance and financial markets
• Pharmaceutical and research companies and healthcare
16
6. Conclusion
Text mining can be defined as a technique which is used to extract
interesting information or knowledge from the text documents
which are usually in the unstructured form.
In near future the proposed technique is implemented using JAVA
technology and the comparative results are provided.
Applications in the field such as identifying business intelligence,
bioinformatics, security application, web search enhancement, CRM
and other sectors are studied.
17

More Related Content

What's hot

Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining FrameworkPrakhyath Rai
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
A profile of Applied Data Analysis Lab (ADA Lab)
A profile of Applied Data Analysis Lab (ADA Lab)A profile of Applied Data Analysis Lab (ADA Lab)
A profile of Applied Data Analysis Lab (ADA Lab)Lukasz Bolikowski
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceIJCERT
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitationbutest
 
Umesha naik metadata
Umesha naik metadataUmesha naik metadata
Umesha naik metadataUmesha Naik
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsSeth Grimes
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
 

What's hot (19)

Text MIning
Text MIningText MIning
Text MIning
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining Framework
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
A profile of Applied Data Analysis Lab (ADA Lab)
A profile of Applied Data Analysis Lab (ADA Lab)A profile of Applied Data Analysis Lab (ADA Lab)
A profile of Applied Data Analysis Lab (ADA Lab)
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitation
 
Umesha naik metadata
Umesha naik metadataUmesha naik metadata
Umesha naik metadata
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and Semantics
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
D1802023136
D1802023136D1802023136
D1802023136
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Dspace OAI-PMH
Dspace OAI-PMHDspace OAI-PMH
Dspace OAI-PMH
 

Similar to A Survey on Text Mining-techniques and application

"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "ijtsrd
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ijaia
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text minianhcrowley
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitationbutest
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachAIRCC Publishing Corporation
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHijcsit
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Miningijsc
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...IJCSIS Research Publications
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining  A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining ijsc
 
Automatic Text Classification using Supervised Learning
Automatic Text Classification using Supervised LearningAutomatic Text Classification using Supervised Learning
Automatic Text Classification using Supervised Learningdbpublications
 
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
 

Similar to A Survey on Text Mining-techniques and application (20)

"Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment ""Analysis of Different Text Classification Algorithms: An Assessment "
"Analysis of Different Text Classification Algorithms: An Assessment "
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...
ESTIMATION OF REGRESSION COEFFICIENTS USING GEOMETRIC MEAN OF SQUARED ERROR F...
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitation
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis Approach
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
 
10.1.1.118.1099
10.1.1.118.109910.1.1.118.1099
10.1.1.118.1099
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 An Investigation of Keywords Extraction from Textual Documents using Word2Ve... An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
An Investigation of Keywords Extraction from Textual Documents using Word2Ve...
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining  A Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
 
Automatic Text Classification using Supervised Learning
Automatic Text Classification using Supervised LearningAutomatic Text Classification using Supervised Learning
Automatic Text Classification using Supervised Learning
 
B0410206010
B0410206010B0410206010
B0410206010
 
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

A Survey on Text Mining-techniques and application

  • 1. Introducing Journal: A Survey on Text Mining-techniques and application Ryota Eisaki January 14, 2021 Author: Prof.S M.Inazalkar, Jai sharma Publisher: International Journal of Research In Science Engineering 1
  • 2. Abstract In this paper, we combine data mining methods for analyzing quantitative and qualitative data from financial reports, in order to see if the textual part of the report contains some indication about future financial performance. 2
  • 3. 1. Introduction Text is the most common vehicle for the formal exchange of information. The tool is a text mining system which has the capability to analyze large quantities of natural language text and detects lexical and linguistic usage patterns in an attempt to extract meaningful and useful information. 3
  • 4. 2. Related Work This paper presents an inventive and valuable pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to advance the effectiveness of using and updating discovered patterns for finding appropriate and interesting information. 4
  • 5. 3. Text Mining Process Document Gathering the text documents are collected in different formats Document Pre-Processing (a) Tokenization the given document string devided into one unit or token (b) Removal of Stop Word the removal of useful words like, a, but, of, etc. (c) Stemming consider inflectional and derivational stemming Text Transformation Vector Space Model and Bag of words 5
  • 6. 3. Text Mining Process Feature Selection (attribute selection) This method results in giving low database space, minimal search technique by taking out irrelevant feature from input document. Data mining/Pattern Selection Evaluate measures the outcome 6
  • 7. 4. Text Mining Techniques 1. Categorization the assignment of natural language documents to predefined categories according to their content. finding main theme of document by adding metadata and analyzing document. 7
  • 8. 4. Text Mining Techniques 1.1. Decision Trees rebuild the manual categorization of training documents The main advantage of decision tree is the fact that the output tree is easy to interpret even for persons who are not familiar with the details of the model. the risk is known as "over fitting" but there are ways to solve this. 8
  • 9. 4. Text Mining Techniques 2. Clustering divide the similar text into the same cluster. it differs from classification in than documents are clusters on the fly instead of through the use of pre-defined topics. 9
  • 10. 4. Text Mining Techniques 2.1.Hierarchical Clustering It works at different level of granularity. 1. Bottom up hierarchical clustering method: a Consider each document as a single cluster b Calculate similarity of cluster then marge the two clusters having maximum similarity c Repeat b till single cluster is formed. 2. Top down hierarchical clustering method: starts from a single cluster as whole, then it get split iteratively into various cluster on the basis of smallest similarity measure. The advantage of Hierarchical Clustering is that it can use any form of similarity measure and the disadvantage is that once the clusters are formed, cannot be rebuild. 10
  • 11. 4. Text Mining Techniques 3. Information Extraction the documents are first converted into the structured databases on which data mining techniques can be applied to extract knowledge or interesting patterns. 11
  • 12. 4. Text Mining Techniques 4. Information Visualization Visual text mining, or information visualization, puts large textual sources in a visual hierarchy or map and provides browsing capabilities, in addition to simple searching. 12
  • 13. 4. Text Mining Techniques 5. Natural Language Processing(NLP) NLP is a technology that concerns with natural language generation(NLG) and natural language understanding(NLU). 13
  • 14. 4. Text Mining Techniques 6. Topic Tracking A topic tracking system mechanism by custody of user profiles and, based on the documents the user views, guess other documents of interest to the user. 14
  • 15. 5. Applications • Business Intelligence • Bioinformatics • Security Application the analysis and monitoring of documents • Human Resource Management for the purpose of recruiting the candidate by reading and writing their CVs • Web Search Enhancement 15
  • 16. 5. Applications others • Publishing and media • Telecommunications, energy and other services industries • Information technology sector and Internet • Banks, insurance and financial markets • Pharmaceutical and research companies and healthcare 16
  • 17. 6. Conclusion Text mining can be defined as a technique which is used to extract interesting information or knowledge from the text documents which are usually in the unstructured form. In near future the proposed technique is implemented using JAVA technology and the comparative results are provided. Applications in the field such as identifying business intelligence, bioinformatics, security application, web search enhancement, CRM and other sectors are studied. 17