Welcome to
Our presentation..
1
Course Name: Data Mining
Course Code: CSE 450
Section: I
TEXT MINING
PRESENTED BY:
MD. MAHAMUD HASAN
MUSHFIQUR RAHMAN
2
Outline
 Introduction
 Data Mining vs Text Mining
 Text Mining Process
 Text Mining Applications
 Challenges in Text Mining
 Conclusion
3
Introduction
 What is Text Mining? And Why text mining?
# Text mining is the analysis of data contained in natural
language text.
 Massive amount of new information being created World’s data
doubles every 18 months (Jacques Vallee Ph.D)
 80-90% of all data is held in various unstructured formats
 Useful information can be derived from this unstructured data
4
Reasons for Text Mining
Collections of Text
Structured Data
5
How Text Mining Differs from Data
Mining
Data Mining
 Identify data sets
 Select features
 Prepare data
 Analyze Distribution
Text Mining
 Identify documents
 Extract features
 Select features by algorithm
 Prepare data
 Analyze distribution
6
Text mining process 7
Text preprocessing
Syntactic/Semantic text
analysis
Features Generation
Bag of words
Features Selection
Simple counting
Statistics
Text/Data Mining
Classification- Supervised
learning
Clustering- Unsupervised
learning
Analyzing results
Mapping/Visualization
Result interpretation
Text mining applications
 Call Center Software.
 Anti-Spam.
 Market Intelligence.
 Mining in web .
 Web log analysis
8
Challenges in Text Mining
 Information is in unstructured textual form and it’s
in Natural Language (NL).
 Not readily accessible to be used by computers.
 Dealing with huge collections of documents.
 Require Skillful person to choose which documents
that will treat , and analysis the output .
 Require more time.
 Cost , 50,000$ just to software.
9
Conclusion
 Finally, most refer to that the field of text mining are still in the research
phase and still its applications limited operation at the present time
 But the possibilities that can be provided, which helps to understand the
huge amounts of text and extract the core of which information is
important and useful prospects in many areas .
10
Thank You For Your Attention
11

Text mining presentation in Data mining Area

  • 1.
    Welcome to Our presentation.. 1 CourseName: Data Mining Course Code: CSE 450 Section: I
  • 2.
    TEXT MINING PRESENTED BY: MD.MAHAMUD HASAN MUSHFIQUR RAHMAN 2
  • 3.
    Outline  Introduction  DataMining vs Text Mining  Text Mining Process  Text Mining Applications  Challenges in Text Mining  Conclusion 3
  • 4.
    Introduction  What isText Mining? And Why text mining? # Text mining is the analysis of data contained in natural language text.  Massive amount of new information being created World’s data doubles every 18 months (Jacques Vallee Ph.D)  80-90% of all data is held in various unstructured formats  Useful information can be derived from this unstructured data 4
  • 5.
    Reasons for TextMining Collections of Text Structured Data 5
  • 6.
    How Text MiningDiffers from Data Mining Data Mining  Identify data sets  Select features  Prepare data  Analyze Distribution Text Mining  Identify documents  Extract features  Select features by algorithm  Prepare data  Analyze distribution 6
  • 7.
    Text mining process7 Text preprocessing Syntactic/Semantic text analysis Features Generation Bag of words Features Selection Simple counting Statistics Text/Data Mining Classification- Supervised learning Clustering- Unsupervised learning Analyzing results Mapping/Visualization Result interpretation
  • 8.
    Text mining applications Call Center Software.  Anti-Spam.  Market Intelligence.  Mining in web .  Web log analysis 8
  • 9.
    Challenges in TextMining  Information is in unstructured textual form and it’s in Natural Language (NL).  Not readily accessible to be used by computers.  Dealing with huge collections of documents.  Require Skillful person to choose which documents that will treat , and analysis the output .  Require more time.  Cost , 50,000$ just to software. 9
  • 10.
    Conclusion  Finally, mostrefer to that the field of text mining are still in the research phase and still its applications limited operation at the present time  But the possibilities that can be provided, which helps to understand the huge amounts of text and extract the core of which information is important and useful prospects in many areas . 10
  • 11.
    Thank You ForYour Attention 11