SlideShare a Scribd company logo
1 of 26
Download to read offline
NLTK: Natural Language Toolkit
  Overview and Application
                  Jimmy Lai
            Jimmy.lai@oi-sys.com
   Software Engineer @ Oxygen Intelligence
                 2012/03/21


                                             1
Outline
1.   An application based on NLP: 聚寶評
2.   Introduction to Natural Language Processing
3.   Brief History of NLTK
4.   Overview of NLTK
5.   Application of NLTK: Topic Classification on
     PTT



                                                    2
聚寶評 www.ezpao.com

      美食搜尋引擎




                    3
聚寶評 www.ezpao.com

     語意分析搜尋引擎




                    4
評論主題分析




網友分享菜分析




  正評/負評分析




                     5
Natural Language Processing (NLP)
•    語音識別(Speech recognition)
•    詞性標註(Part-of-speech tagging)
•    句法分析(Parsing)
•    自然語言生成(Natural language generation)
•    文本分類(Text classification)
•    信息抽取(Information extraction)
•    機器翻譯(Machine translation)
•    文字蘊涵(Textual entailment)
    via Wikipedia
                                           6
NLTK: Natural Language Toolkit
• http://www.nltk.org/
• Author: Steven Bird, Edward Loper, Ewan Klein
• Originally developed for class student has
  background either in computer science or
  linguistics.
• Currently:
  – Education: over 100 courses in 23 countries.
  – Research: over 250 papers cites NLTK.

                                                   7
Outline
1.   An application based on NLP: 聚寶評
2.   Introduction to Natural Language Processing
3.   Brief History of NLTK
4.   Overview of NLTK
5.   Application of NLTK: Topic Classification on
     PTT



                                                    8
NLP in NLTK –
Annotated Text Corpora




             Resources:
             from nltk.corpus import *
                                         9
NLP in NLTK –
 Text Tokenization, Normalization
Text Processing Flow




                       Resources:
                       from nltk.tokenize import *
                                                     10
NLP in NLTK –
Part-of-speech Tagging




              Resources:
              from nltk.tag import *
                                       11
NLP in NLTK –
Text Classification




            Resources:
            from nltk.classify import *
                                          12
NLP in NLTK –
Entity Recognition




            Resources:
            from nltk.chunk import *
                                       13
NLP in NLTK –
Grammar Tree




         Resources:
         from nltk.parse import *
                                    14
NLP in NLTK –
          Semantic of Sentence
• Propositional Logic
• First-Order Logic
• Disclosure Semantics




                         Resources:
                         from nltk.sem import *
                                                  15
Outline
1.   An application based on NLP: 聚寶評
2.   Introduction to Natural Language Processing
3.   Brief History of NLTK
4.   Overview of NLTK
5.   Application of NLTK: Topic Classification on
     PTT



                                                    16
Topic Classification on PTT
• 熱門看板:
 – 文章主題明確: Food(美食版), HatePolitics(政黑
   板), Baseball(棒球版), Stock(股票版), Boy-Girl(男
   女版)
 – 文章主題廣泛: Gossiping(八卦版)
• 目標: 將八卦版的文章依照主題分類,就可
  以只挑選有興趣的主題的文章來閱讀。


                                               17
System Flow




              18
Tokenization




               19
Text Classification




                      20
Result – Boy-Girl




                    21
Result – HatePolitics




                        22
Result – Food




                23
Result – Stock




                 24
Reference
• Steven Bird, Ewan Klein, and Edward
  Loper, “Natural Language Processing with
  Python”, 2009.
• Jacob Perkins, “Python Text Processing with NLTK
  2.0 Cookbook”, 2010
• Loper, E., & Bird, S. (2002). NLTK: The Natural
  Language Toolkit. Proceedings of the ACL02
  Workshop on Effective tools and methodologies
  for teaching natural language processing and
  computational linguistics.
                                                 25
We are hiring
•   核心引擎演算法研發工程師
•   系統研發工程師
•   網路應用研發工程師
•   市場研究及網路服務產品設計經理

• Oxygen-Intelligence Taiwan Limited
  引京聚點 知識結構搜索股份有限公司
• 公司簡介、職缺簡介:http://goo.gl/amjAJ
• 請將履歷寄到 jimmy.lai@oi-sys.com
                                       26

More Related Content

What's hot

Document Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitBen Healey
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture noteSusang Kim
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Pythonanntp
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
KiwiPyCon 2014 talk - Understanding human language with Python
KiwiPyCon 2014 talk - Understanding human language with PythonKiwiPyCon 2014 talk - Understanding human language with Python
KiwiPyCon 2014 talk - Understanding human language with PythonAlyona Medelyan
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019 Alexis Agahi
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101Jaemin Cho
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonJanu Jahnavi
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | EdurekaEdureka!
 
Most Asked Python Interview Questions
Most Asked Python Interview QuestionsMost Asked Python Interview Questions
Most Asked Python Interview QuestionsShubham Shrimant
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
HackYale - Natural Language Processing (Week 0)
HackYale - Natural Language Processing (Week 0)HackYale - Natural Language Processing (Week 0)
HackYale - Natural Language Processing (Week 0)Nick Hathaway
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 

What's hot (20)

Document Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language ToolkitDocument Classification using the Python Natural Language Toolkit
Document Classification using the Python Natural Language Toolkit
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Sk t academy lecture note
Sk t academy lecture noteSk t academy lecture note
Sk t academy lecture note
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
KiwiPyCon 2014 talk - Understanding human language with Python
KiwiPyCon 2014 talk - Understanding human language with PythonKiwiPyCon 2014 talk - Understanding human language with Python
KiwiPyCon 2014 talk - Understanding human language with Python
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 
Devoxx traitement automatique du langage sur du texte en 2019
Devoxx   traitement automatique du langage sur du texte en 2019 Devoxx   traitement automatique du langage sur du texte en 2019
Devoxx traitement automatique du langage sur du texte en 2019
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | Edureka
 
Most Asked Python Interview Questions
Most Asked Python Interview QuestionsMost Asked Python Interview Questions
Most Asked Python Interview Questions
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016Turtlebot Poster_Summer 2016
Turtlebot Poster_Summer 2016
 
Python - Lesson 1
Python - Lesson 1Python - Lesson 1
Python - Lesson 1
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
HackYale - Natural Language Processing (Week 0)
HackYale - Natural Language Processing (Week 0)HackYale - Natural Language Processing (Week 0)
HackYale - Natural Language Processing (Week 0)
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 

Viewers also liked

Corpus Bootstrapping with NLTK
Corpus Bootstrapping with NLTKCorpus Bootstrapping with NLTK
Corpus Bootstrapping with NLTKJacob Perkins
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Pythonshanbady
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...codin9cafe
 
ZOETWITT in the Press
ZOETWITT in the PressZOETWITT in the Press
ZOETWITT in the Presszoetwitt
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notificationokazaki117
 
semantic social network analysis
semantic social network analysissemantic social network analysis
semantic social network analysisguillaume ereteo
 
IEEE 2016-2017 SOFTWARE TITLE
IEEE 2016-2017  SOFTWARE TITLE IEEE 2016-2017  SOFTWARE TITLE
IEEE 2016-2017 SOFTWARE TITLE FOCUSLOGICPROJECTS
 
Ijcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalIjcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalMichal Ptaszynski
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr UsageJimmy Lai
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookJimmy Lai
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst NanodegreeJimmy Lai
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast PrototypingJimmy Lai
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in pythonJimmy Lai
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHugJimmy Lai
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012Jimmy Lai
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTKFrancesco Bruni
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Shalin Hai-Jew
 

Viewers also liked (20)

Corpus Bootstrapping with NLTK
Corpus Bootstrapping with NLTKCorpus Bootstrapping with NLTK
Corpus Bootstrapping with NLTK
 
NLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in PythonNLTK - Natural Language Processing in Python
NLTK - Natural Language Processing in Python
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
codin9cafe[2015.03. 18]Python learning for natural language processing - 홍은기(...
 
ZOETWITT in the Press
ZOETWITT in the PressZOETWITT in the Press
ZOETWITT in the Press
 
Python & Stuff
Python & StuffPython & Stuff
Python & Stuff
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notification
 
semantic social network analysis
semantic social network analysissemantic social network analysis
semantic social network analysis
 
IEEE 2016-2017 SOFTWARE TITLE
IEEE 2016-2017  SOFTWARE TITLE IEEE 2016-2017  SOFTWARE TITLE
IEEE 2016-2017 SOFTWARE TITLE
 
Ijcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-finalIjcai ip-2015 cyberbullying-final
Ijcai ip-2015 cyberbullying-final
 
[LDSP] Solr Usage
[LDSP] Solr Usage[LDSP] Solr Usage
[LDSP] Solr Usage
 
Fast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython NotebookFast data mining flow prototyping using IPython Notebook
Fast data mining flow prototyping using IPython Notebook
 
Data Analyst Nanodegree
Data Analyst NanodegreeData Analyst Nanodegree
Data Analyst Nanodegree
 
Aisb cyberbullying
Aisb cyberbullyingAisb cyberbullying
Aisb cyberbullying
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
 
Software development practices in python
Software development practices in pythonSoftware development practices in python
Software development practices in python
 
Documentation with sphinx @ PyHug
Documentation with sphinx @ PyHugDocumentation with sphinx @ PyHug
Documentation with sphinx @ PyHug
 
When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012When big data meet python @ COSCUP 2012
When big data meet python @ COSCUP 2012
 
Basic NLP with Python and NLTK
Basic NLP with Python and NLTKBasic NLP with Python and NLTK
Basic NLP with Python and NLTK
 
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3 Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
 

Similar to Nltk natural language toolkit overview and application @ PyHug

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4LKoji Sekiguchi
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxFitsum36
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPDatabricks
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
frontmatter.pptx
frontmatter.pptxfrontmatter.pptx
frontmatter.pptxNicole91901
 
Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Mokhtar Ebrahim
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in aiRam Kumar
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Why Python 2015017.pptx
Why Python 2015017.pptxWhy Python 2015017.pptx
Why Python 2015017.pptxHari35159
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural language identification
Natural language identificationNatural language identification
Natural language identificationShaktiTaneja
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingGeeks Anonymes
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
Roeder rocky 2011_46
Roeder rocky 2011_46Roeder rocky 2011_46
Roeder rocky 2011_46Chris Roeder
 

Similar to Nltk natural language toolkit overview and application @ PyHug (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
The State of #NLProc
The State of #NLProcThe State of #NLProc
The State of #NLProc
 
Intro to KotlinNLP
Intro to KotlinNLPIntro to KotlinNLP
Intro to KotlinNLP
 
Introduction to KotlinNLP
Introduction to KotlinNLPIntroduction to KotlinNLP
Introduction to KotlinNLP
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4L
 
NATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptxNATURAL LANGUAGE PROCESSING.pptx
NATURAL LANGUAGE PROCESSING.pptx
 
ppt
pptppt
ppt
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
frontmatter.pptx
frontmatter.pptxfrontmatter.pptx
frontmatter.pptx
 
Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)Nlp tutorial using python nltk (simple examples)
Nlp tutorial using python nltk (simple examples)
 
Natural language processing and its application in ai
Natural language processing and its application in aiNatural language processing and its application in ai
Natural language processing and its application in ai
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Why Python 2015017.pptx
Why Python 2015017.pptxWhy Python 2015017.pptx
Why Python 2015017.pptx
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural language identification
Natural language identificationNatural language identification
Natural language identification
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
Roeder rocky 2011_46
Roeder rocky 2011_46Roeder rocky 2011_46
Roeder rocky 2011_46
 

More from Jimmy Lai

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdfJimmy Lai
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramJimmy Lai
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Jimmy Lai
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge BaseJimmy Lai
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesJimmy Lai
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugJimmy Lai
 

More from Jimmy Lai (10)

Python Linters at Scale.pdf
Python Linters at Scale.pdfPython Linters at Scale.pdf
Python Linters at Scale.pdf
 
The journey of asyncio adoption in instagram
The journey of asyncio adoption in instagramThe journey of asyncio adoption in instagram
The journey of asyncio adoption in instagram
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...Continuous Delivery: automated testing, continuous integration and continuous...
Continuous Delivery: automated testing, continuous integration and continuous...
 
Build a Searchable Knowledge Base
Build a Searchable Knowledge BaseBuild a Searchable Knowledge Base
Build a Searchable Knowledge Base
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Apache thrift-RPC service cross languages
Apache thrift-RPC service cross languagesApache thrift-RPC service cross languages
Apache thrift-RPC service cross languages
 
NetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHugNetworkX - python graph analysis and visualization @ PyHug
NetworkX - python graph analysis and visualization @ PyHug
 

Recently uploaded

UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 

Nltk natural language toolkit overview and application @ PyHug

  • 1. NLTK: Natural Language Toolkit Overview and Application Jimmy Lai Jimmy.lai@oi-sys.com Software Engineer @ Oxygen Intelligence 2012/03/21 1
  • 2. Outline 1. An application based on NLP: 聚寶評 2. Introduction to Natural Language Processing 3. Brief History of NLTK 4. Overview of NLTK 5. Application of NLTK: Topic Classification on PTT 2
  • 3. 聚寶評 www.ezpao.com 美食搜尋引擎 3
  • 4. 聚寶評 www.ezpao.com 語意分析搜尋引擎 4
  • 6. Natural Language Processing (NLP) • 語音識別(Speech recognition) • 詞性標註(Part-of-speech tagging) • 句法分析(Parsing) • 自然語言生成(Natural language generation) • 文本分類(Text classification) • 信息抽取(Information extraction) • 機器翻譯(Machine translation) • 文字蘊涵(Textual entailment) via Wikipedia 6
  • 7. NLTK: Natural Language Toolkit • http://www.nltk.org/ • Author: Steven Bird, Edward Loper, Ewan Klein • Originally developed for class student has background either in computer science or linguistics. • Currently: – Education: over 100 courses in 23 countries. – Research: over 250 papers cites NLTK. 7
  • 8. Outline 1. An application based on NLP: 聚寶評 2. Introduction to Natural Language Processing 3. Brief History of NLTK 4. Overview of NLTK 5. Application of NLTK: Topic Classification on PTT 8
  • 9. NLP in NLTK – Annotated Text Corpora Resources: from nltk.corpus import * 9
  • 10. NLP in NLTK – Text Tokenization, Normalization Text Processing Flow Resources: from nltk.tokenize import * 10
  • 11. NLP in NLTK – Part-of-speech Tagging Resources: from nltk.tag import * 11
  • 12. NLP in NLTK – Text Classification Resources: from nltk.classify import * 12
  • 13. NLP in NLTK – Entity Recognition Resources: from nltk.chunk import * 13
  • 14. NLP in NLTK – Grammar Tree Resources: from nltk.parse import * 14
  • 15. NLP in NLTK – Semantic of Sentence • Propositional Logic • First-Order Logic • Disclosure Semantics Resources: from nltk.sem import * 15
  • 16. Outline 1. An application based on NLP: 聚寶評 2. Introduction to Natural Language Processing 3. Brief History of NLTK 4. Overview of NLTK 5. Application of NLTK: Topic Classification on PTT 16
  • 17. Topic Classification on PTT • 熱門看板: – 文章主題明確: Food(美食版), HatePolitics(政黑 板), Baseball(棒球版), Stock(股票版), Boy-Girl(男 女版) – 文章主題廣泛: Gossiping(八卦版) • 目標: 將八卦版的文章依照主題分類,就可 以只挑選有興趣的主題的文章來閱讀。 17
  • 25. Reference • Steven Bird, Ewan Klein, and Edward Loper, “Natural Language Processing with Python”, 2009. • Jacob Perkins, “Python Text Processing with NLTK 2.0 Cookbook”, 2010 • Loper, E., & Bird, S. (2002). NLTK: The Natural Language Toolkit. Proceedings of the ACL02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics. 25
  • 26. We are hiring • 核心引擎演算法研發工程師 • 系統研發工程師 • 網路應用研發工程師 • 市場研究及網路服務產品設計經理 • Oxygen-Intelligence Taiwan Limited 引京聚點 知識結構搜索股份有限公司 • 公司簡介、職缺簡介:http://goo.gl/amjAJ • 請將履歷寄到 jimmy.lai@oi-sys.com 26