TEXT MINING
Presented By:
Prakhyath Rai
Asst. Professor, Dept. of ISE,
SCEM, Mangaluru
Outline
 Introduction
 Data Mining Vs. Text Mining
 Motivation for Text Mining
 I/O Model for Text Mining
 Steps for Text Mining
 Key Terms in Text Mining
 Text Mining Frameworks
 Merits of Text Mining
 Applications of Text Mining
 Demerits of Text Mining
 References
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Introduction
Text Mining is a Discovery
Text Mining is also referred as Text Data Mining (TDM)
and Knowledge Discovery in Textual Database (KDT).
Text Mining is used to extract relevant information or
knowledge or pattern from different sources that are in
unstructured or semi-structured form.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Introduction Cont.
Extract and discover knowledge hidden in text
automatically
Aid domain experts by automatically:
 identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Data Mining vs. Text Mining
Data Mining Text Mining
Process directly Linguistic processing or natural
language processing (NLP)
Identify causal relationship Discover heretofore unknown
information
Structured Data Semi-structured & Unstructured
Data (Text)
Structured numeric transaction
data residing in rational data
warehouse
Applications deal with much
more diverse and eclectic
collections of systems and
formats
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Motivation for Text Mining
Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Input-Output Model for Text Mining
Input
Text Mining
Technique
Output
Patterns
Connections
Trends
Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Steps for Text Mining
Pre-Processing the Text
Applying Text Mining Techniques
Summarization
Classification
Clustering
Visualization
Information Extraction
Analyzing the Text
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Keywords Terms in Text Mining
 Information Extraction (IE)
The science of searching for
Information in documents
Documents themselves
Metadata which describe
documents
Text, sound, images or data,
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
 Artificial Intelligence (AI)
Artificial intelligence
(AI) is a branch of
computer science and
engineering that deals
with intelligent behavior,
learning, and adaptation
in machines.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation
Extraction of relevant Information and Relationships
from Natural Documents
Extraction of Information from Unstructured or Semi-
structured Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Applications of Text Mining
Analysis of Market Trends
Classification Technique
Information Extraction Technique
Analysis and Screening of Junk Emails
 Classification on the basis of pre-defined frequently
occurring items
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Demerits of Text Mining
Requires Initial Learned Information System for
Initial Extraction
Suitable programs are not been defined to Analyze
Text from Mining Knowledge or Information
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.
[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.
[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.
[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for
Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.
[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of
the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
Text MIning
Text MIning

Text MIning

  • 1.
    TEXT MINING Presented By: PrakhyathRai Asst. Professor, Dept. of ISE, SCEM, Mangaluru
  • 2.
    Outline  Introduction  DataMining Vs. Text Mining  Motivation for Text Mining  I/O Model for Text Mining  Steps for Text Mining  Key Terms in Text Mining  Text Mining Frameworks  Merits of Text Mining  Applications of Text Mining  Demerits of Text Mining  References Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 3.
    Introduction Text Mining isa Discovery Text Mining is also referred as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT). Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or semi-structured form. Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 4.
    Introduction Cont. Extract anddiscover knowledge hidden in text automatically Aid domain experts by automatically:  identifying concepts extracting facts/relations discovering implicit links generating hypotheses Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 5.
    Data Mining vs.Text Mining Data Mining Text Mining Process directly Linguistic processing or natural language processing (NLP) Identify causal relationship Discover heretofore unknown information Structured Data Semi-structured & Unstructured Data (Text) Structured numeric transaction data residing in rational data warehouse Applications deal with much more diverse and eclectic collections of systems and formats Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 6.
    Motivation for TextMining Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 7.
    Input-Output Model forText Mining Input Text Mining Technique Output Patterns Connections Trends Documents Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 8.
    Steps for TextMining Pre-Processing the Text Applying Text Mining Techniques Summarization Classification Clustering Visualization Information Extraction Analyzing the Text Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 9.
    Keywords Terms inText Mining  Information Extraction (IE) The science of searching for Information in documents Documents themselves Metadata which describe documents Text, sound, images or data, within database: relational stand-alone database or hypertext networked databases such as the Internet or intranets.  Artificial Intelligence (AI) Artificial intelligence (AI) is a branch of computer science and engineering that deals with intelligent behavior, learning, and adaptation in machines. Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 10.
    Merits of TextMining Database limits itself to Storage of less Information whereas Text Mining overcomes this limitation Extraction of relevant Information and Relationships from Natural Documents Extraction of Information from Unstructured or Semi- structured Documents Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 11.
    Applications of TextMining Analysis of Market Trends Classification Technique Information Extraction Technique Analysis and Screening of Junk Emails  Classification on the basis of pre-defined frequently occurring items Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 12.
    Demerits of TextMining Requires Initial Learned Information System for Initial Extraction Suitable programs are not been defined to Analyze Text from Mining Knowledge or Information Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
  • 13.
    References [1] R Baeza-Yatesand B Ribeiro-Neto. “Modern Information Retrieval”, ACM Press, New York, 1999. [2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1, January 2012. [3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141- 160, Van Schaik Pub., South Africa, 2005. [4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for Information Extraction”, Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999. [5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577- 583, Austin, TX, July 2000. Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007