Nltk natural language toolkit overview and application @ PyHug

3,811 views

Published on

NLTK is a python toolkit for Natural Language Processing. In this slide, the author provides overview for NLTK and demonstrates an application in Chinese text classification.

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,811
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
116
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Nltk natural language toolkit overview and application @ PyHug

  1. 1. NLTK: Natural Language Toolkit Overview and Application Jimmy Lai Jimmy.lai@oi-sys.com Software Engineer @ Oxygen Intelligence 2012/03/21 1
  2. 2. Outline1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on PTT 2
  3. 3. 聚寶評 www.ezpao.com 美食搜尋引擎 3
  4. 4. 聚寶評 www.ezpao.com 語意分析搜尋引擎 4
  5. 5. 評論主題分析網友分享菜分析 正評/負評分析 5
  6. 6. Natural Language Processing (NLP)• 語音識別(Speech recognition)• 詞性標註(Part-of-speech tagging)• 句法分析(Parsing)• 自然語言生成(Natural language generation)• 文本分類(Text classification)• 信息抽取(Information extraction)• 機器翻譯(Machine translation)• 文字蘊涵(Textual entailment) via Wikipedia 6
  7. 7. NLTK: Natural Language Toolkit• http://www.nltk.org/• Author: Steven Bird, Edward Loper, Ewan Klein• Originally developed for class student has background either in computer science or linguistics.• Currently: – Education: over 100 courses in 23 countries. – Research: over 250 papers cites NLTK. 7
  8. 8. Outline1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on PTT 8
  9. 9. NLP in NLTK –Annotated Text Corpora Resources: from nltk.corpus import * 9
  10. 10. NLP in NLTK – Text Tokenization, NormalizationText Processing Flow Resources: from nltk.tokenize import * 10
  11. 11. NLP in NLTK –Part-of-speech Tagging Resources: from nltk.tag import * 11
  12. 12. NLP in NLTK –Text Classification Resources: from nltk.classify import * 12
  13. 13. NLP in NLTK –Entity Recognition Resources: from nltk.chunk import * 13
  14. 14. NLP in NLTK –Grammar Tree Resources: from nltk.parse import * 14
  15. 15. NLP in NLTK – Semantic of Sentence• Propositional Logic• First-Order Logic• Disclosure Semantics Resources: from nltk.sem import * 15
  16. 16. Outline1. An application based on NLP: 聚寶評2. Introduction to Natural Language Processing3. Brief History of NLTK4. Overview of NLTK5. Application of NLTK: Topic Classification on PTT 16
  17. 17. Topic Classification on PTT• 熱門看板: – 文章主題明確: Food(美食版), HatePolitics(政黑 板), Baseball(棒球版), Stock(股票版), Boy-Girl(男 女版) – 文章主題廣泛: Gossiping(八卦版)• 目標: 將八卦版的文章依照主題分類,就可 以只挑選有興趣的主題的文章來閱讀。 17
  18. 18. System Flow 18
  19. 19. Tokenization 19
  20. 20. Text Classification 20
  21. 21. Result – Boy-Girl 21
  22. 22. Result – HatePolitics 22
  23. 23. Result – Food 23
  24. 24. Result – Stock 24
  25. 25. Reference• Steven Bird, Ewan Klein, and Edward Loper, “Natural Language Processing with Python”, 2009.• Jacob Perkins, “Python Text Processing with NLTK 2.0 Cookbook”, 2010• Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. Proceedings of the ACL02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics. 25
  26. 26. We are hiring• 核心引擎演算法研發工程師• 系統研發工程師• 網路應用研發工程師• 市場研究及網路服務產品設計經理• Oxygen-Intelligence Taiwan Limited 引京聚點 知識結構搜索股份有限公司• 公司簡介、職缺簡介:http://goo.gl/amjAJ• 請將履歷寄到 jimmy.lai@oi-sys.com 26

×