SlideShare a Scribd company logo
TEXT MINING
seminar submitted by:
Ali Abdul_Zahraa
Msc,MathcompUOK
ali.abdulzahraa@gmail.com
Outline
Introduction
Data Mining vs Text Mining
Text Mining Process
Text Mining Applications
Challenges in Text Mining
Conclusion
Introduction
• What is Text Mining?
– Text mining is the analysis of data contained in
natural language text
Introduction
• Why Text Mining?
– Massive amount of new information being
created World’s data doubles every 18 months
(Jacques Vallee Ph.D)
– 80-90% of all data is held in various
unstructured formats
– Useful information can be derived from this
unstructured data
Unstructured Data Examples “Ore”
• Email
• Insurance claims
• News articles
• Web pages
• Patent portfolios
• Customer
complaint letters
• Contracts
• Transcripts of
phone calls with
customers
• Technical
documents
Reasons for Text Mining
0
10
20
30
40
50
60
70
80
90
Percentage
Collections of
Text
Structured Data
How Text Mining Differs from Data
Mining
Data Mining
• Identify data sets
• Select features
• Prepare data
• Analyze
distribution
Text Mining
• Identify documents
• Extract features
• Select features by
algorithm
• Prepare data
• Analyze
distribution
Mining
 Filtering : remove punctuation, special
characters .
Segmentation: segment document to
words.
Stemming : Techniques used to
find out the root/stem of a word:
– E.g.,
– user engineering
– users engineered
– used engineer
– using
• Stem (root) : use engineer
Usefulness
• improving effectiveness of retrieval and text mining
– matching similar words
• reducing indexing size
– combing words with same roots may reduce indexing size as much
as 40-50%.
Mining
 Basic stemming methods
• remove ending
– if a word ends with a consonant other than s,
followed by an s, then delete s.
– if a word ends in es, drop the s.
– if a word ends in ing, delete the ing unless the remaining word consists only
of one letter or of th.
– If a word ends with ed, preceded by a consonant, delete the ed unless this
leaves only a single letter.
– …...
• transform words
– if a word ends with “ies” but not “eies” or “aies” then “ies ”
Mining
Mining
eliminate excessive words : words that not
give meaning by itself such as preposition
, conjunction , conditional particle.
That is performed by comparison with a list
of these words.
Canonical Names
President Bush
Mr. Bush
George Bush
Canonical Name:
George Bush
• The canonical name is the most explicit, least
ambiguous name constructed from the different
variants found in the document
• Reduces ambiguity of variants
Mining
Clipping : eliminate words that appear in high
or low frequency.
o The low frequency’s words will forms small
clusters that not useful , and high frequency’s
words that is always appear and it’s also not
useful.
o There is many ways to calculate word’s
frequency in document(s)
Mining
Clustering : Clustering interrelated
documents, based on documents topics.
Text Mining: Analysis
• Which words are most present.
• Which words are most interesting .
• Which words help define the document.
• What are the interesting text phrases?
Text mining applications
• Call Center Software.
• Anti-Spam.
• Market Intelligence.
• Mining in web .
Actual examples
• One of clinical center in USA be capable of
determine one of genes that responsible for
one of harmful diseases by treat greater than
150,000 news paper.
• Text mining in holy Quran.
• Etc….
Challenges in Text Mining
• Information is in unstructured textual form and it’s
in Natural Language (NL).
• Not readily accessible to be used by computers.
• Dealing with huge collections of documents.
• Require Skillful person to choose which documents
that will treat , and analysis the output .
• Require more time.
• Cost , 50,000$ just to software.
More information
• Central Intelligence Agency (CIA) the most
supportive to text mining .
- 11/ September events.
- mining in E-mail , chat rooms, and social
networks .
-So its support many companies such as
Attensity ،Inxight , Intelliseek.
More information
• SPSS company statistic’s : text mining software
user’s so little comparing with data mining
software user’s.
conclusion
• Finally, most refer to that the field of text
mining are still in the research phase
• and still its applications limited operation at
the present time
• but the possibilities that can be provided,
which helps to understand the huge amounts
of text and extract the core of which
information is important and useful prospects
in many areas .
Text mining

More Related Content

What's hot

Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
Datamining Tools
 
Text mining
Text miningText mining
Text mining
Malik Imran
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Eng Teong Cheah
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
Seerat Malik
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Aanchal Chaurasia
 
Web mining
Web mining Web mining
Web mining
TeklayBirhane
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
Ali Abbasi
 
Data mining
Data miningData mining
Data mining
Akannsha Totewar
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
MahamudHasanCSE
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
bhavesh lande
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
RAHUL DANGWAL
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
Manohar Swamynathan
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 

What's hot (20)

Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Text mining
Text miningText mining
Text mining
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Web mining
Web mining Web mining
Web mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Data mining
Data miningData mining
Data mining
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 

Viewers also liked

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
Hemant Sharma
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
Krish_ver2
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
Jeremiah Fadugba
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text miningLars Juhl Jensen
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 

Viewers also liked (6)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
Introduction to text mining
Introduction to text miningIntroduction to text mining
Introduction to text mining
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 

Similar to Text mining

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
santoshi mangalgi
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
Yunyao Li
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
Davood Rafiei
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
testbest6
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
Marianne Sweeny
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
HayomeTakele
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 documentUma Kant
 
Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Michael Fanning
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
aini658222
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining
Bhawi247
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
Victor de Boer
 
NLP Introduction.ppt machine learning presentation
NLP  Introduction.ppt machine learning presentationNLP  Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
PriyankaRamavath3
 
Text Mining
Text MiningText Mining
Text Mining
sathish sak
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Louis Rosenfeld
 
How to get started on researching your m sc project
How to get started on researching your m sc projectHow to get started on researching your m sc project
How to get started on researching your m sc project
Royal Holloway University of London
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalS. M. Hassan Zaidi
 
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
National Information Standards Organization (NISO)
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)WiLS
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppt
testbest6
 

Similar to Text mining (20)

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Oss swot
Oss swotOss swot
Oss swot
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt2_text operationinformation retrieval. ppt
2_text operationinformation retrieval. ppt
 
Intro 2 document
Intro 2 documentIntro 2 document
Intro 2 document
 
Five Reasons To Clone Librarians
Five Reasons To Clone Librarians Five Reasons To Clone Librarians
Five Reasons To Clone Librarians
 
Fundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptxFundamentals Concepts on Text Analytics.pptx
Fundamentals Concepts on Text Analytics.pptx
 
Text mining and data mining
Text mining and data mining Text mining and data mining
Text mining and data mining
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
NLP Introduction.ppt machine learning presentation
NLP  Introduction.ppt machine learning presentationNLP  Introduction.ppt machine learning presentation
NLP Introduction.ppt machine learning presentation
 
Text Mining
Text MiningText Mining
Text Mining
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 
How to get started on researching your m sc project
How to get started on researching your m sc projectHow to get started on researching your m sc project
How to get started on researching your m sc project
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine Final
 
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
Carpenter, McCraken, Ventimiglia, Noonan, and Walker "KBART and the OpenURL: ...
 
The economics of information (1)
The economics of information (1)The economics of information (1)
The economics of information (1)
 
Natural_Language_Processing_1.ppt
Natural_Language_Processing_1.pptNatural_Language_Processing_1.ppt
Natural_Language_Processing_1.ppt
 

More from Ali A Jalil

Clean Code: Successive Refinement
Clean Code: Successive RefinementClean Code: Successive Refinement
Clean Code: Successive Refinement
Ali A Jalil
 
And or graph
And or graphAnd or graph
And or graph
Ali A Jalil
 
Markov model
Markov modelMarkov model
Markov model
Ali A Jalil
 
Image classification
Image classificationImage classification
Image classification
Ali A Jalil
 
HDR
HDRHDR
Photometric calibration
Photometric calibrationPhotometric calibration
Photometric calibration
Ali A Jalil
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Polygon drawing
Polygon drawingPolygon drawing
Polygon drawing
Ali A Jalil
 
Features image processing and Extaction
Features image processing and ExtactionFeatures image processing and Extaction
Features image processing and Extaction
Ali A Jalil
 

More from Ali A Jalil (10)

Clean Code: Successive Refinement
Clean Code: Successive RefinementClean Code: Successive Refinement
Clean Code: Successive Refinement
 
And or graph
And or graphAnd or graph
And or graph
 
Markov model
Markov modelMarkov model
Markov model
 
Image classification
Image classificationImage classification
Image classification
 
HDR
HDRHDR
HDR
 
Photometric calibration
Photometric calibrationPhotometric calibration
Photometric calibration
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Polygon drawing
Polygon drawingPolygon drawing
Polygon drawing
 
Polygon drawing
Polygon drawingPolygon drawing
Polygon drawing
 
Features image processing and Extaction
Features image processing and ExtactionFeatures image processing and Extaction
Features image processing and Extaction
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 

Text mining

  • 1. TEXT MINING seminar submitted by: Ali Abdul_Zahraa Msc,MathcompUOK ali.abdulzahraa@gmail.com
  • 2. Outline Introduction Data Mining vs Text Mining Text Mining Process Text Mining Applications Challenges in Text Mining Conclusion
  • 3. Introduction • What is Text Mining? – Text mining is the analysis of data contained in natural language text
  • 4. Introduction • Why Text Mining? – Massive amount of new information being created World’s data doubles every 18 months (Jacques Vallee Ph.D) – 80-90% of all data is held in various unstructured formats – Useful information can be derived from this unstructured data
  • 5. Unstructured Data Examples “Ore” • Email • Insurance claims • News articles • Web pages • Patent portfolios • Customer complaint letters • Contracts • Transcripts of phone calls with customers • Technical documents
  • 6. Reasons for Text Mining 0 10 20 30 40 50 60 70 80 90 Percentage Collections of Text Structured Data
  • 7. How Text Mining Differs from Data Mining Data Mining • Identify data sets • Select features • Prepare data • Analyze distribution Text Mining • Identify documents • Extract features • Select features by algorithm • Prepare data • Analyze distribution
  • 8. Mining  Filtering : remove punctuation, special characters . Segmentation: segment document to words.
  • 9. Stemming : Techniques used to find out the root/stem of a word: – E.g., – user engineering – users engineered – used engineer – using • Stem (root) : use engineer Usefulness • improving effectiveness of retrieval and text mining – matching similar words • reducing indexing size – combing words with same roots may reduce indexing size as much as 40-50%. Mining
  • 10.  Basic stemming methods • remove ending – if a word ends with a consonant other than s, followed by an s, then delete s. – if a word ends in es, drop the s. – if a word ends in ing, delete the ing unless the remaining word consists only of one letter or of th. – If a word ends with ed, preceded by a consonant, delete the ed unless this leaves only a single letter. – …... • transform words – if a word ends with “ies” but not “eies” or “aies” then “ies ” Mining
  • 11. Mining eliminate excessive words : words that not give meaning by itself such as preposition , conjunction , conditional particle. That is performed by comparison with a list of these words.
  • 12. Canonical Names President Bush Mr. Bush George Bush Canonical Name: George Bush • The canonical name is the most explicit, least ambiguous name constructed from the different variants found in the document • Reduces ambiguity of variants
  • 13. Mining Clipping : eliminate words that appear in high or low frequency. o The low frequency’s words will forms small clusters that not useful , and high frequency’s words that is always appear and it’s also not useful. o There is many ways to calculate word’s frequency in document(s)
  • 14. Mining Clustering : Clustering interrelated documents, based on documents topics.
  • 15. Text Mining: Analysis • Which words are most present. • Which words are most interesting . • Which words help define the document. • What are the interesting text phrases?
  • 16. Text mining applications • Call Center Software. • Anti-Spam. • Market Intelligence. • Mining in web .
  • 17. Actual examples • One of clinical center in USA be capable of determine one of genes that responsible for one of harmful diseases by treat greater than 150,000 news paper. • Text mining in holy Quran. • Etc….
  • 18. Challenges in Text Mining • Information is in unstructured textual form and it’s in Natural Language (NL). • Not readily accessible to be used by computers. • Dealing with huge collections of documents. • Require Skillful person to choose which documents that will treat , and analysis the output . • Require more time. • Cost , 50,000$ just to software.
  • 19. More information • Central Intelligence Agency (CIA) the most supportive to text mining . - 11/ September events. - mining in E-mail , chat rooms, and social networks . -So its support many companies such as Attensity ،Inxight , Intelliseek.
  • 20. More information • SPSS company statistic’s : text mining software user’s so little comparing with data mining software user’s.
  • 21. conclusion • Finally, most refer to that the field of text mining are still in the research phase • and still its applications limited operation at the present time • but the possibilities that can be provided, which helps to understand the huge amounts of text and extract the core of which information is important and useful prospects in many areas .