SlideShare a Scribd company logo
Python for NLP and the Natural
Language Toolkit
Introductions
 Anirudh K Menon
Software Engineer at IGate, working as part of the Big Data & Analytics Team there.
I am a computer Science Engineer having experience in web development and big data
space.
My e-mail : animenon@mail.com
 You?
Natural Language Processing
 Fundamental goal: deep understanding of broad language
 Not just string processing or keyword matching!
 End systems that we want to build:
 Ambitious: speech recognition, machine translation, question answering…
 Modest: spelling correction, text categorization…
Example: Machine Translation
NLP applications
 Text Categorization
 Spelling & Grammar Corrections
 Information Extraction
 Speech Recognition
 Information Retrieval
 Synonym Generation
 Summarization
 Machine Translation
 Question Answering
 Dialog Systems
 Language generation
Why NLP is difficult
 A NLP system needs to answer the question “who did what to whom”
 Language is ambiguous
 At all levels: lexical, phrase, semantic
 Iraqi Head Seeks Arms
 Word sense is ambiguous (head, arms)
 Stolen Painting Found by Tree
 Thematic role is ambiguous: tree is agent or location?
 Ban on Nude Dancing on Governor’s Desk
 Syntactic structure (attachment) is ambiguous: is the ban or the dancing on the desk?
 Hospitals Are Sued by 7 Foot Doctors
 Semantics is ambiguous : what is 7 foot?
Why NLP is difficult
 Language is flexible
 New words, new meanings
 Different meanings in different contexts
 Language is subtle
 He arrived at the lecture
 He chuckled at the lecture
 He chuckled his way through the lecture
 **He arrived his way through the lecture
 Language is complex!
Corpus-based statistical approaches to
tackle NLP problem
 How can a can a machine understand these differences?
 Decorate the cake with the frosting
 Decorate the cake with the kids
 Rules based approaches, i.e. hand coded syntactic constraints and preference rules:
 The verb decorate require an animate being as agent
 The object cake is formed by any of the following, inanimate entities (cream, dough,
frosting…..)
 Such approaches have been showed to be time consuming to build, do not scale up
well and are very brittle to new, unusual, metaphorical use of language
 To swallow requires an animate being as agent/subject and a physical object as object
 I swallowed his story or the actor swallowed his lines.
 The supernova swallowed the planet
Corpus-based statistical approaches
to tackle NLP problem
 Feature extractions (usually linguistics motivated)
 Statistical models
 Data (corpora, labels, linguistic resources)
Intro to NLTK
 The NLTK is a set of Python modules to carry out many common natural
language tasks.
 NLTK defines an infrastructure that can be used to build NLP programs in
Python.
 It provides basic classes for representing data relevant to natural language
processing.
 There are versions for Windows, OS X, Unix, Linux. Detailed instructions
on Installation tab
 Windows :
>>> import nltk
>>> nltk.download('all')
 Linux :
$ pip install --upgrade nltk
NLTK: Top-Level Organization
 NLTK is organized as a flat hierarchy of packages
and modules.
 Each module provides the tools necessary to
address a specific task
 Modules contain two types of classes:
 Data-oriented classes are used to represent information
relevant to natural language processing.
 Task-oriented classes encapsulate the resources and
methods needed to perform a specific task.
Modules
 The NLTK modules include:
 token: classes for representing and processing individual elements of
text, such as words and sentences
 probability: classes for representing and processing probabilistic
information.
 tree: classes for representing and processing hierarchical information
over text.
 cfg: classes for representing and processing context free grammars.
 tagger: tagging each word with a part-of-speech, a sense, etc
 parser: building trees over text (includes chart, chunk and probabilistic
parsers)
 classifier: classify text into categories (includes feature,
featureSelection, maxent, naivebayes)
 draw: visualize NLP structures and processes
 corpus: access (tagged) corpus data
 We will cover some of these explicitly as we reach topics.
 Standard interfaces for performing tasks such as part-of-speech tagging,
syntactic parsing, and text classification.
 Standard implementations for each task can be combined to solve
complex problems.
Example
 The most basic natural language processing technique is tokenization.
 Tokenization means splitting the input into tokens.
Eg: Word Tokenization –
Input : “Hey there, How are you all?”
Output : “Hey”, “there,”, “How”, “are”, “you”, “all?”
The task of converting a text from a single string to a list of tokens is known as
tokenization.
Tokens and Types
 The term word can be used in two different ways:
1. To refer to an individual occurrence of a word
2. To refer to an abstract vocabulary item
 For example, the sentence “my dog likes his dog”
contains five occurrences of words, but four vocabulary
items.
 To avoid confusion use more precise terminology:
1. Word token: an occurrence of a word
2. Word Type: a vocabulary item
Examples on python shell
 Tokenization
 Sentence Detection
 Common Usages, etc.
References
1. CS1573: AI Application Development, Spring 2003
(modified from Edward Loper’s notes)
2. nltk.sourceforge.net/tutorial/introduction/index.html
3. Applied Natural Language Processing, Fall 2009, by Barbara Rosario
Thank You for your patient listening!
Contact : animenon@mail.com

More Related Content

What's hot

Nlp
NlpNlp
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
Prakash Pimpale
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
Attention
AttentionAttention
Attention
SEMINARGROOT
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Edureka!
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
ASWINKP11
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
KarenVacca
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
Thomas Delteil
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
Sanzid Kawsar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Varunjeet Singh Rekhi
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
Shubhankar Mohan
 

What's hot (20)

Nlp
NlpNlp
Nlp
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Attention
AttentionAttention
Attention
 
NLTK
NLTKNLTK
NLTK
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 

Viewers also liked

Python NLTK
Python NLTKPython NLTK
Python NLTK
Alberts Pumpurs
 
NLTK
NLTKNLTK
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
Vijay Ganti
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
Olivier Grisel
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
Ahmed Magdy Ezzeldin, MSc.
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
Gagan Gowda
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
Diana Maynard
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
Jimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Hansi Thenuwara
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
Umar Alharaky
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programmingsmjk
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 

Viewers also liked (19)

Python NLTK
Python NLTKPython NLTK
Python NLTK
 
NLTK
NLTKNLTK
NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programming
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Nltk

Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
AlyaaMachi
 
NLP
NLPNLP
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Massimo Schenone
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 
REPORT.doc
REPORT.docREPORT.doc
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
punedevscom
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
rudolf eremyan
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present) Melody Joey
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
pavankalyanadroittec
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
Chandan Deb
 
Artificial inteIegence & Machine learning - Key Concepts
Artificial inteIegence & Machine learning - Key ConceptsArtificial inteIegence & Machine learning - Key Concepts
Artificial inteIegence & Machine learning - Key Concepts
HasibAhmadKhaliqi1
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
nilesh405711
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
dhruv_chaudhari
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
Sara Hooker
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
Prakash Anand
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
Lakshya Sivaramakrishnan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

Similar to Nltk (20)

Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
NLP
NLPNLP
NLP
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Artificial inteIegence & Machine learning - Key Concepts
Artificial inteIegence & Machine learning - Key ConceptsArtificial inteIegence & Machine learning - Key Concepts
Artificial inteIegence & Machine learning - Key Concepts
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

Nltk

  • 1. Python for NLP and the Natural Language Toolkit
  • 2. Introductions  Anirudh K Menon Software Engineer at IGate, working as part of the Big Data & Analytics Team there. I am a computer Science Engineer having experience in web development and big data space. My e-mail : animenon@mail.com  You?
  • 3. Natural Language Processing  Fundamental goal: deep understanding of broad language  Not just string processing or keyword matching!  End systems that we want to build:  Ambitious: speech recognition, machine translation, question answering…  Modest: spelling correction, text categorization…
  • 5. NLP applications  Text Categorization  Spelling & Grammar Corrections  Information Extraction  Speech Recognition  Information Retrieval  Synonym Generation  Summarization  Machine Translation  Question Answering  Dialog Systems  Language generation
  • 6. Why NLP is difficult  A NLP system needs to answer the question “who did what to whom”  Language is ambiguous  At all levels: lexical, phrase, semantic  Iraqi Head Seeks Arms  Word sense is ambiguous (head, arms)  Stolen Painting Found by Tree  Thematic role is ambiguous: tree is agent or location?  Ban on Nude Dancing on Governor’s Desk  Syntactic structure (attachment) is ambiguous: is the ban or the dancing on the desk?  Hospitals Are Sued by 7 Foot Doctors  Semantics is ambiguous : what is 7 foot?
  • 7. Why NLP is difficult  Language is flexible  New words, new meanings  Different meanings in different contexts  Language is subtle  He arrived at the lecture  He chuckled at the lecture  He chuckled his way through the lecture  **He arrived his way through the lecture  Language is complex!
  • 8. Corpus-based statistical approaches to tackle NLP problem  How can a can a machine understand these differences?  Decorate the cake with the frosting  Decorate the cake with the kids  Rules based approaches, i.e. hand coded syntactic constraints and preference rules:  The verb decorate require an animate being as agent  The object cake is formed by any of the following, inanimate entities (cream, dough, frosting…..)  Such approaches have been showed to be time consuming to build, do not scale up well and are very brittle to new, unusual, metaphorical use of language  To swallow requires an animate being as agent/subject and a physical object as object  I swallowed his story or the actor swallowed his lines.  The supernova swallowed the planet
  • 9. Corpus-based statistical approaches to tackle NLP problem  Feature extractions (usually linguistics motivated)  Statistical models  Data (corpora, labels, linguistic resources)
  • 10. Intro to NLTK  The NLTK is a set of Python modules to carry out many common natural language tasks.  NLTK defines an infrastructure that can be used to build NLP programs in Python.  It provides basic classes for representing data relevant to natural language processing.  There are versions for Windows, OS X, Unix, Linux. Detailed instructions on Installation tab  Windows : >>> import nltk >>> nltk.download('all')  Linux : $ pip install --upgrade nltk
  • 11. NLTK: Top-Level Organization  NLTK is organized as a flat hierarchy of packages and modules.  Each module provides the tools necessary to address a specific task  Modules contain two types of classes:  Data-oriented classes are used to represent information relevant to natural language processing.  Task-oriented classes encapsulate the resources and methods needed to perform a specific task.
  • 12. Modules  The NLTK modules include:  token: classes for representing and processing individual elements of text, such as words and sentences  probability: classes for representing and processing probabilistic information.  tree: classes for representing and processing hierarchical information over text.  cfg: classes for representing and processing context free grammars.  tagger: tagging each word with a part-of-speech, a sense, etc  parser: building trees over text (includes chart, chunk and probabilistic parsers)  classifier: classify text into categories (includes feature, featureSelection, maxent, naivebayes)  draw: visualize NLP structures and processes  corpus: access (tagged) corpus data  We will cover some of these explicitly as we reach topics.
  • 13.  Standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification.  Standard implementations for each task can be combined to solve complex problems.
  • 14. Example  The most basic natural language processing technique is tokenization.  Tokenization means splitting the input into tokens. Eg: Word Tokenization – Input : “Hey there, How are you all?” Output : “Hey”, “there,”, “How”, “are”, “you”, “all?” The task of converting a text from a single string to a list of tokens is known as tokenization.
  • 15. Tokens and Types  The term word can be used in two different ways: 1. To refer to an individual occurrence of a word 2. To refer to an abstract vocabulary item  For example, the sentence “my dog likes his dog” contains five occurrences of words, but four vocabulary items.  To avoid confusion use more precise terminology: 1. Word token: an occurrence of a word 2. Word Type: a vocabulary item
  • 16. Examples on python shell  Tokenization  Sentence Detection  Common Usages, etc.
  • 17. References 1. CS1573: AI Application Development, Spring 2003 (modified from Edward Loper’s notes) 2. nltk.sourceforge.net/tutorial/introduction/index.html 3. Applied Natural Language Processing, Fall 2009, by Barbara Rosario
  • 18. Thank You for your patient listening! Contact : animenon@mail.com