TEXT MINING
BY- ADITYA SHARMA
BCA-3E
03921402021
CONTENTS
2
1) INTRODUCTION
2) DATA MINING Vs. TEXT MINING
3) MOTIVATION FOR TEXT MINING
4) STEPS FOR TEXT MINING
5) KEY TERMS IN TEXT MINING
6) MERITS OF TEXT MINING
7) APPLICATIONS OF TEXT MINING
8) DEMERITS OF TEXT MINING
9) REFERENCES
INTRODUCTION
3
1) Text Mining is a Discovery
2) Text Mining is also referred as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT). Text
Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or
semi-structured form.
3) Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in
unstructured or semi-structured form.
4) Extract and discover knowledge hidden in text automatically
5) Aid domain experts by automatically:
identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses
DATA MINING VS. TEXT MINING
4
5
Data Mining Text Mining
Process directly Linguistic processing or natural
language processing (NLP)
Identify causal relationship Discover heretofore unknown
information
Structured Data Semi-structured & Unstructured
Data (Text)
Structured numeric transaction data
residing in rational data warehouse
Applications deal with much more
diverse and eclectic collections of
systems and formats
MOTIVATION FOR TEXT MINING
6
7
1) Approximately 90% of the world’s data is held in unstructured formats (source:
Oracle Corporation)
2) Information intensive business processes demand that we transcend from simple
document retrieval to “knowledge” discovery.
STEPS FOR TEXT MINING
1) Pre-Processing the Text
2) Applying Text Mining Techniques
-Summarization
-Classification
-Clustering
-Visualization
-Information Extraction
3) Analyzing the Text
8
KEY TERMS IN TEXT MINING
1) Information Extraction (IE)
-The science of searching for
-Information in documents
-Documents themselves
-Metadata which describe documents
-Text, sound, images or data, within database: relational stand-alone database or hypertext
networked databases such as the Internet or intranets.
2) Artificial Intelligence (AI)
-Artificial intelligence (AI) is a branch of computer science and engineering that deals with
intelligent behavior, learning, and adaptation in machines
9
Merits of text mining
10
Database limits itself to Storage of less Information whereas Text Mining
overcomes this limitation
Extraction of relevant Information and Relationships from Natural Documents
Extraction of Information from Unstructured or Semi- structured Documents
Applications of text mining
11
12
Analysis of Market Trends
-Classification Technique
-Information Extraction Technique
Analysis and Screening of Junk Emails
-Classification on the basis of pre-defined frequently occurring items
Demerits of text mining
13
1. Requires Initial Learned Information System for Initial Extraction
2. Suitable programs are not been defined to Analyze Text from Mining Knowledge or Information
3. Misguided interpretations or the misuse of information.
references
14
1) www.wikipdedia.com
2) Geeksforgeeks.org
3) www.howtogeek.com
4) www.techopedia.com
THANK YOU

TEXT MINING.pptx

  • 1.
    TEXT MINING BY- ADITYASHARMA BCA-3E 03921402021
  • 2.
    CONTENTS 2 1) INTRODUCTION 2) DATAMINING Vs. TEXT MINING 3) MOTIVATION FOR TEXT MINING 4) STEPS FOR TEXT MINING 5) KEY TERMS IN TEXT MINING 6) MERITS OF TEXT MINING 7) APPLICATIONS OF TEXT MINING 8) DEMERITS OF TEXT MINING 9) REFERENCES
  • 3.
    INTRODUCTION 3 1) Text Miningis a Discovery 2) Text Mining is also referred as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT). Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or semi-structured form. 3) Text Mining is used to extract relevant information or knowledge or pattern from different sources that are in unstructured or semi-structured form. 4) Extract and discover knowledge hidden in text automatically 5) Aid domain experts by automatically: identifying concepts extracting facts/relations discovering implicit links generating hypotheses
  • 4.
    DATA MINING VS.TEXT MINING 4
  • 5.
    5 Data Mining TextMining Process directly Linguistic processing or natural language processing (NLP) Identify causal relationship Discover heretofore unknown information Structured Data Semi-structured & Unstructured Data (Text) Structured numeric transaction data residing in rational data warehouse Applications deal with much more diverse and eclectic collections of systems and formats
  • 6.
  • 7.
    7 1) Approximately 90%of the world’s data is held in unstructured formats (source: Oracle Corporation) 2) Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery.
  • 8.
    STEPS FOR TEXTMINING 1) Pre-Processing the Text 2) Applying Text Mining Techniques -Summarization -Classification -Clustering -Visualization -Information Extraction 3) Analyzing the Text 8
  • 9.
    KEY TERMS INTEXT MINING 1) Information Extraction (IE) -The science of searching for -Information in documents -Documents themselves -Metadata which describe documents -Text, sound, images or data, within database: relational stand-alone database or hypertext networked databases such as the Internet or intranets. 2) Artificial Intelligence (AI) -Artificial intelligence (AI) is a branch of computer science and engineering that deals with intelligent behavior, learning, and adaptation in machines 9
  • 10.
    Merits of textmining 10 Database limits itself to Storage of less Information whereas Text Mining overcomes this limitation Extraction of relevant Information and Relationships from Natural Documents Extraction of Information from Unstructured or Semi- structured Documents
  • 11.
  • 12.
    12 Analysis of MarketTrends -Classification Technique -Information Extraction Technique Analysis and Screening of Junk Emails -Classification on the basis of pre-defined frequently occurring items
  • 13.
    Demerits of textmining 13 1. Requires Initial Learned Information System for Initial Extraction 2. Suitable programs are not been defined to Analyze Text from Mining Knowledge or Information 3. Misguided interpretations or the misuse of information.
  • 14.
    references 14 1) www.wikipdedia.com 2) Geeksforgeeks.org 3)www.howtogeek.com 4) www.techopedia.com
  • 15.