ประสบการณ์การวิเคราะห์ข้อมูลด้วยวิธีการทาเหมืองข้อมูล (Text Mining)                                               ดร.อลิสา...
Text Mining is about… “Sifting through vast collections of unstructured or semistructured data beyond the reach of data mi...
Humans VS. Computers• Humans: Ability to distinguish and apply linguistic patterns to text   – Could overcome language dif...
Text Mining Tasks• Information extraction:  – Analyze unstructured text and identify key words or    phrases and relations...
Text Mining Tasks• Categorization:  – Automatic classify documents into predefined    categories• Clustering:  – Group sim...
Text Mining Tasks• Information Visualization  – Represent documents or information in graphical    formats for easily brow...
Applications: Tech Mining• Tech Mining is the application of text mining  tools to science and technology (S&T)  informati...
Tech Mining Process                      8
Technical Intelligences:Who, What, When, Where?• Digest multiple S&T information resources• Profile Research Domains:  –  ...
What if I don’t have TechMining Software?                            10
What if I don’t have TechMining Software?                            11
Output example from TechMining SoftwareSource: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Foreca...
Applications: Expert Finder                              13
Applications: Expert Finder                              14
Applications: Expert Finder                              15
Applications: ABDUL(Artificial BudDy U Love)• An online information service which currently provides  access to Thai lingu...
Applications: ABDUL(Artificial BudDy U Love)                            17
Applications: ABDUL(Artificial BudDy U Love)                            18
Web 1.0 VS. Web 2.0                      19
User-Generated Contents• With the Web 2.0 or social networking websites, the  amount of user-generated contents has increa...
Online Opinion Resources
Characteristics of OnlineReviews• Natural language and unstructured text format• Some reviews are long and contain only a ...
Opinion Mining• Opinion mining and sentiment analysis is a task for  analyzing and summarizing what people think about a  ...
Feature-Based Opinion Mining• This approach typically consists of two following  steps:      1. Identifying and extracting...
Opinion Mining on Hotel Reviews inThailand (Graphical Display)                                     25
Opinion Mining on Hotel Reviews inThailand (Textual Display)                                     26
Comparison among Hotels                          27
Opinion Mining on MobileNetwork Operators in Thailand                                28
Opinion Mining on MobileNetwork Operators in Thailand                                29
Challenges in Text Mining• Text Mining = NLP + Data Mining• Statistical NLP  –   Ambiguity  –   Context  –   Tokenization ...
Conclusions• As the amount of data increases, text-mining  tools that sift through it will be increasingly  valuable• Vari...
Thank you for your attention           Q&A                               32
Upcoming SlideShare
Loading in …5
×

Text Mining : Experience

3,195 views

Published on

ประสบการณ์การวิเคราะห์ข้อมูลด้วยวิธีการทำเหมืองข้อมูล (Text Mining)

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,195
On SlideShare
0
From Embeds
0
Number of Embeds
1,835
Actions
Shares
0
Downloads
65
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Text Mining : Experience

  1. 1. ประสบการณ์การวิเคราะห์ข้อมูลด้วยวิธีการทาเหมืองข้อมูล (Text Mining) ดร.อลิสา คงทน นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ 1
  2. 2. Text Mining is about… “Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.” Tapping the Power of Text Mining Communications of the ACM, Sept. 2006 2
  3. 3. Humans VS. Computers• Humans: Ability to distinguish and apply linguistic patterns to text – Could overcome language difficulties such as slangs, spelling variations, contextual meaning• Computers: Ability to process text in large volumes at high speed – Could sift through a large collection of texts to find simple statistics and relationship among terms in an instant of time• Text mining requires a combination of both Humans linguistic capability + computers speed and accuracy NLP Data Mining
  4. 4. Text Mining Tasks• Information extraction: – Analyze unstructured text and identify key words or phrases and relationships within text• Topic detection and tracking: – Filter and present only documents relevant to the user profile• Summarization: – Text summarization reduces the content by retaining only its main points and overall meaning 4
  5. 5. Text Mining Tasks• Categorization: – Automatic classify documents into predefined categories• Clustering: – Group similar documents based on their similarity• Concept Linkage – Connect related documents by identifying their shared concepts, helping users find information they perhaps wouldnt have found through traditional search methods 5
  6. 6. Text Mining Tasks• Information Visualization – Represent documents or information in graphical formats for easily browsing, viewing, or searching• Question and answering (Q&A) – Search and extract the best answer to a given question 6
  7. 7. Applications: Tech Mining• Tech Mining is the application of text mining tools to science and technology (S&T) information particularly bibliographic abstracts• It exploits the S&T databases to see patterns, detect associations, and foresee opportunities 7
  8. 8. Tech Mining Process 8
  9. 9. Technical Intelligences:Who, What, When, Where?• Digest multiple S&T information resources• Profile Research Domains: – Who? – What? – When? – Where?• Map Relationships: Topics & Teams• Analyze Trends: What’s Hot & What’s Coming• And do so -- Quickly 9
  10. 10. What if I don’t have TechMining Software? 10
  11. 11. What if I don’t have TechMining Software? 11
  12. 12. Output example from TechMining SoftwareSource: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005) 12
  13. 13. Applications: Expert Finder 13
  14. 14. Applications: Expert Finder 14
  15. 15. Applications: Expert Finder 15
  16. 16. Applications: ABDUL(Artificial BudDy U Love)• An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence translation) and information resources (e.g., weather condition, stock price, gas price, traffic condition, etc.)• Users are able to use natural language to interact with ABDUL via Instant Messaging (IM) based protocol, Web browser, and Mobile devices 16
  17. 17. Applications: ABDUL(Artificial BudDy U Love) 17
  18. 18. Applications: ABDUL(Artificial BudDy U Love) 18
  19. 19. Web 1.0 VS. Web 2.0 19
  20. 20. User-Generated Contents• With the Web 2.0 or social networking websites, the amount of user-generated contents has increased exponentially• User-generated contents often contain opinions and/or sentiments• An in-depth analysis of these opinionated texts could reveal potentially useful information, e.g., – Preferences of people towards many different topics including news events, social issues and commercial products 20
  21. 21. Online Opinion Resources
  22. 22. Characteristics of OnlineReviews• Natural language and unstructured text format• Some reviews are long and contain only a few sentences expressing opinions on the product• Could be difficult for a potential reader to understand and analyze each review that maybe relevant to his or her decision making 22
  23. 23. Opinion Mining• Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a certain topic• Opinion mining has gained a lot of interest in text mining and NLP communities• Three granularities of opinion mining: – Document level – Sentence level – Feature level 23
  24. 24. Feature-Based Opinion Mining• This approach typically consists of two following steps: 1. Identifying and extracting features of an object, topic or event from each sentence 2. Determining whether the opinions regarding the features are positive or negative 24
  25. 25. Opinion Mining on Hotel Reviews inThailand (Graphical Display) 25
  26. 26. Opinion Mining on Hotel Reviews inThailand (Textual Display) 26
  27. 27. Comparison among Hotels 27
  28. 28. Opinion Mining on MobileNetwork Operators in Thailand 28
  29. 29. Opinion Mining on MobileNetwork Operators in Thailand 29
  30. 30. Challenges in Text Mining• Text Mining = NLP + Data Mining• Statistical NLP – Ambiguity – Context – Tokenization Sentence Detection – POS tagging• Data Mining – Ability to process the data – Massive amounts of data – Determining and extracting information of interest 30
  31. 31. Conclusions• As the amount of data increases, text-mining tools that sift through it will be increasingly valuable• Various applications for academic and industry uses 31
  32. 32. Thank you for your attention Q&A 32

×