Text MIning

TEXT MINING
Presented By:
Prakhyath Rai
Asst. Professor, Dept. of ISE,
SCEM, Mangaluru

Outline
 Introduction
 Data Mining Vs. Text Mining
 Motivation for Text Mining
 I/O Model for Text Mining
 Steps for Text Mining
 Key Terms in Text Mining
 Text Mining Frameworks
 Merits of Text Mining
 Applications of Text Mining
 Demerits of Text Mining
 References
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007

Introduction
Text Mining is a Discovery
Text Mining is also referred as Text Data Mining (TDM)
and Knowledge Discovery in Textual Database (KDT).
Text Mining is used to extract relevant information or
knowledge or pattern from different sources that are in
unstructured or semi-structured form.

Introduction Cont.
Extract and discover knowledge hidden in text
automatically
Aid domain experts by automatically:
 identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses

Data Mining vs. Text Mining
Data Mining Text Mining
Process directly Linguistic processing or natural
language processing (NLP)
Identify causal relationship Discover heretofore unknown
information
Structured Data Semi-structured & Unstructured
Data (Text)
Structured numeric transaction
data residing in rational data
warehouse
Applications deal with much
more diverse and eclectic
collections of systems and
formats

Motivation for Text Mining
Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.

Input-Output Model for Text Mining
Input
Text Mining
Technique
Output
Patterns
Connections
Trends
Documents

Steps for Text Mining
Pre-Processing the Text
Applying Text Mining Techniques
Summarization
Classification
Clustering
Visualization
Information Extraction
Analyzing the Text

Keywords Terms in Text Mining
 Information Extraction (IE)
The science of searching for
Information in documents
Documents themselves
Metadata which describe
documents
Text, sound, images or data,
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
 Artificial Intelligence (AI)
Artificial intelligence
(AI) is a branch of
computer science and
engineering that deals
with intelligent behavior,
learning, and adaptation
in machines.

Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation
Extraction of relevant Information and Relationships
from Natural Documents
Extraction of Information from Unstructured or Semi-
structured Documents

Applications of Text Mining
Analysis of Market Trends
Classification Technique
Information Extraction Technique
Analysis and Screening of Junk Emails
 Classification on the basis of pre-defined frequently
occurring items

Demerits of Text Mining
Requires Initial Learned Information System for
Initial Extraction
Suitable programs are not been defined to Analyze
Text from Mining Knowledge or Information

References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.
[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.
[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.
[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for
Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.
[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of
the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.

Text MIning

More Related Content

What's hot

Viewers also liked

Similar to Text MIning

More from Prakhyath Rai

Recently uploaded

Text MIning