SlideShare a Scribd company logo
1 of 18
Download to read offline
Python for NLP and the Natural
Language Toolkit
Introductions
 Anirudh K Menon
Software Engineer at IGate, working as part of the Big Data & Analytics Team there.
I am a computer Science Engineer having experience in web development and big data
space.
My e-mail : animenon@mail.com
 You?
Natural Language Processing
 Fundamental goal: deep understanding of broad language
 Not just string processing or keyword matching!
 End systems that we want to build:
 Ambitious: speech recognition, machine translation, question answering…
 Modest: spelling correction, text categorization…
Example: Machine Translation
NLP applications
 Text Categorization
 Spelling & Grammar Corrections
 Information Extraction
 Speech Recognition
 Information Retrieval
 Synonym Generation
 Summarization
 Machine Translation
 Question Answering
 Dialog Systems
 Language generation
Why NLP is difficult
 A NLP system needs to answer the question “who did what to whom”
 Language is ambiguous
 At all levels: lexical, phrase, semantic
 Iraqi Head Seeks Arms
 Word sense is ambiguous (head, arms)
 Stolen Painting Found by Tree
 Thematic role is ambiguous: tree is agent or location?
 Ban on Nude Dancing on Governor’s Desk
 Syntactic structure (attachment) is ambiguous: is the ban or the dancing on the desk?
 Hospitals Are Sued by 7 Foot Doctors
 Semantics is ambiguous : what is 7 foot?
Why NLP is difficult
 Language is flexible
 New words, new meanings
 Different meanings in different contexts
 Language is subtle
 He arrived at the lecture
 He chuckled at the lecture
 He chuckled his way through the lecture
 **He arrived his way through the lecture
 Language is complex!
Corpus-based statistical approaches to
tackle NLP problem
 How can a can a machine understand these differences?
 Decorate the cake with the frosting
 Decorate the cake with the kids
 Rules based approaches, i.e. hand coded syntactic constraints and preference rules:
 The verb decorate require an animate being as agent
 The object cake is formed by any of the following, inanimate entities (cream, dough,
frosting…..)
 Such approaches have been showed to be time consuming to build, do not scale up
well and are very brittle to new, unusual, metaphorical use of language
 To swallow requires an animate being as agent/subject and a physical object as object
 I swallowed his story or the actor swallowed his lines.
 The supernova swallowed the planet
Corpus-based statistical approaches
to tackle NLP problem
 Feature extractions (usually linguistics motivated)
 Statistical models
 Data (corpora, labels, linguistic resources)
Intro to NLTK
 The NLTK is a set of Python modules to carry out many common natural
language tasks.
 NLTK defines an infrastructure that can be used to build NLP programs in
Python.
 It provides basic classes for representing data relevant to natural language
processing.
 There are versions for Windows, OS X, Unix, Linux. Detailed instructions
on Installation tab
 Windows :
>>> import nltk
>>> nltk.download('all')
 Linux :
$ pip install --upgrade nltk
NLTK: Top-Level Organization
 NLTK is organized as a flat hierarchy of packages
and modules.
 Each module provides the tools necessary to
address a specific task
 Modules contain two types of classes:
 Data-oriented classes are used to represent information
relevant to natural language processing.
 Task-oriented classes encapsulate the resources and
methods needed to perform a specific task.
Modules
 The NLTK modules include:
 token: classes for representing and processing individual elements of
text, such as words and sentences
 probability: classes for representing and processing probabilistic
information.
 tree: classes for representing and processing hierarchical information
over text.
 cfg: classes for representing and processing context free grammars.
 tagger: tagging each word with a part-of-speech, a sense, etc
 parser: building trees over text (includes chart, chunk and probabilistic
parsers)
 classifier: classify text into categories (includes feature,
featureSelection, maxent, naivebayes)
 draw: visualize NLP structures and processes
 corpus: access (tagged) corpus data
 We will cover some of these explicitly as we reach topics.
 Standard interfaces for performing tasks such as part-of-speech tagging,
syntactic parsing, and text classification.
 Standard implementations for each task can be combined to solve
complex problems.
Example
 The most basic natural language processing technique is tokenization.
 Tokenization means splitting the input into tokens.
Eg: Word Tokenization –
Input : “Hey there, How are you all?”
Output : “Hey”, “there,”, “How”, “are”, “you”, “all?”
The task of converting a text from a single string to a list of tokens is known as
tokenization.
Tokens and Types
 The term word can be used in two different ways:
1. To refer to an individual occurrence of a word
2. To refer to an abstract vocabulary item
 For example, the sentence “my dog likes his dog”
contains five occurrences of words, but four vocabulary
items.
 To avoid confusion use more precise terminology:
1. Word token: an occurrence of a word
2. Word Type: a vocabulary item
Examples on python shell
 Tokenization
 Sentence Detection
 Common Usages, etc.
References
1. CS1573: AI Application Development, Spring 2003
(modified from Edward Loper’s notes)
2. nltk.sourceforge.net/tutorial/introduction/index.html
3. Applied Natural Language Processing, Fall 2009, by Barbara Rosario
Thank You for your patient listening!
Contact : animenon@mail.com

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analyticsshanbady
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkJacob Perkins
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingdhruv_chaudhari
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine LearningKarthik Sankar
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionAritra Mukherjee
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyoutsider2
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash CourseCharlie Greenbacker
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nltk - Boston Text Analytics
Nltk - Boston Text AnalyticsNltk - Boston Text Analytics
Nltk - Boston Text Analytics
 
You too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talkYou too can nlp - PyBay 2018 lightning talk
You too can nlp - PyBay 2018 lightning talk
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing and Machine Learning
Natural Language Processing and Machine LearningNatural Language Processing and Machine Learning
Natural Language Processing and Machine Learning
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
NLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easyNLTK: Natural Language Processing made easy
NLTK: Natural Language Processing made easy
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
Natural Language Processing Crash Course
Natural Language Processing Crash CourseNatural Language Processing Crash Course
Natural Language Processing Crash Course
 
OpenNLP demo
OpenNLP demoOpenNLP demo
OpenNLP demo
 
NLTK
NLTKNLTK
NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

Viewers also liked

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnOlivier Grisel
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringAhmed Magdy Ezzeldin, MSc.
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programmingsmjk
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 

Viewers also liked (19)

NLTK in 20 minutes
NLTK in 20 minutesNLTK in 20 minutes
NLTK in 20 minutes
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Neuro Linguistic Programming
Neuro Linguistic ProgrammingNeuro Linguistic Programming
Neuro Linguistic Programming
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to Nltk

Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxAlyaaMachi
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present) Melody Joey
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptpavankalyanadroittec
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxnilesh405711
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using pythonPrakash Anand
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing WorkshopLakshya Sivaramakrishnan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectorsOsebe Sammi
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 

Similar to Nltk (20)

Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
NLP
NLPNLP
NLP
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Prolog (present)
Prolog (present) Prolog (present)
Prolog (present)
 
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this pptAI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
AI UNIT 3 - SRCAS JOC.pptx enjoy this ppt
 
ppt
pptppt
ppt
 
ppt
pptppt
ppt
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Natural language processing using python
Natural language processing using pythonNatural language processing using python
Natural language processing using python
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Pycon India 2018 Natural Language Processing Workshop
Pycon India 2018   Natural Language Processing WorkshopPycon India 2018   Natural Language Processing Workshop
Pycon India 2018 Natural Language Processing Workshop
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Pycon ke word vectors
Pycon ke   word vectorsPycon ke   word vectors
Pycon ke word vectors
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 

Recently uploaded

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Recently uploaded (16)

Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

Nltk

  • 1. Python for NLP and the Natural Language Toolkit
  • 2. Introductions  Anirudh K Menon Software Engineer at IGate, working as part of the Big Data & Analytics Team there. I am a computer Science Engineer having experience in web development and big data space. My e-mail : animenon@mail.com  You?
  • 3. Natural Language Processing  Fundamental goal: deep understanding of broad language  Not just string processing or keyword matching!  End systems that we want to build:  Ambitious: speech recognition, machine translation, question answering…  Modest: spelling correction, text categorization…
  • 5. NLP applications  Text Categorization  Spelling & Grammar Corrections  Information Extraction  Speech Recognition  Information Retrieval  Synonym Generation  Summarization  Machine Translation  Question Answering  Dialog Systems  Language generation
  • 6. Why NLP is difficult  A NLP system needs to answer the question “who did what to whom”  Language is ambiguous  At all levels: lexical, phrase, semantic  Iraqi Head Seeks Arms  Word sense is ambiguous (head, arms)  Stolen Painting Found by Tree  Thematic role is ambiguous: tree is agent or location?  Ban on Nude Dancing on Governor’s Desk  Syntactic structure (attachment) is ambiguous: is the ban or the dancing on the desk?  Hospitals Are Sued by 7 Foot Doctors  Semantics is ambiguous : what is 7 foot?
  • 7. Why NLP is difficult  Language is flexible  New words, new meanings  Different meanings in different contexts  Language is subtle  He arrived at the lecture  He chuckled at the lecture  He chuckled his way through the lecture  **He arrived his way through the lecture  Language is complex!
  • 8. Corpus-based statistical approaches to tackle NLP problem  How can a can a machine understand these differences?  Decorate the cake with the frosting  Decorate the cake with the kids  Rules based approaches, i.e. hand coded syntactic constraints and preference rules:  The verb decorate require an animate being as agent  The object cake is formed by any of the following, inanimate entities (cream, dough, frosting…..)  Such approaches have been showed to be time consuming to build, do not scale up well and are very brittle to new, unusual, metaphorical use of language  To swallow requires an animate being as agent/subject and a physical object as object  I swallowed his story or the actor swallowed his lines.  The supernova swallowed the planet
  • 9. Corpus-based statistical approaches to tackle NLP problem  Feature extractions (usually linguistics motivated)  Statistical models  Data (corpora, labels, linguistic resources)
  • 10. Intro to NLTK  The NLTK is a set of Python modules to carry out many common natural language tasks.  NLTK defines an infrastructure that can be used to build NLP programs in Python.  It provides basic classes for representing data relevant to natural language processing.  There are versions for Windows, OS X, Unix, Linux. Detailed instructions on Installation tab  Windows : >>> import nltk >>> nltk.download('all')  Linux : $ pip install --upgrade nltk
  • 11. NLTK: Top-Level Organization  NLTK is organized as a flat hierarchy of packages and modules.  Each module provides the tools necessary to address a specific task  Modules contain two types of classes:  Data-oriented classes are used to represent information relevant to natural language processing.  Task-oriented classes encapsulate the resources and methods needed to perform a specific task.
  • 12. Modules  The NLTK modules include:  token: classes for representing and processing individual elements of text, such as words and sentences  probability: classes for representing and processing probabilistic information.  tree: classes for representing and processing hierarchical information over text.  cfg: classes for representing and processing context free grammars.  tagger: tagging each word with a part-of-speech, a sense, etc  parser: building trees over text (includes chart, chunk and probabilistic parsers)  classifier: classify text into categories (includes feature, featureSelection, maxent, naivebayes)  draw: visualize NLP structures and processes  corpus: access (tagged) corpus data  We will cover some of these explicitly as we reach topics.
  • 13.  Standard interfaces for performing tasks such as part-of-speech tagging, syntactic parsing, and text classification.  Standard implementations for each task can be combined to solve complex problems.
  • 14. Example  The most basic natural language processing technique is tokenization.  Tokenization means splitting the input into tokens. Eg: Word Tokenization – Input : “Hey there, How are you all?” Output : “Hey”, “there,”, “How”, “are”, “you”, “all?” The task of converting a text from a single string to a list of tokens is known as tokenization.
  • 15. Tokens and Types  The term word can be used in two different ways: 1. To refer to an individual occurrence of a word 2. To refer to an abstract vocabulary item  For example, the sentence “my dog likes his dog” contains five occurrences of words, but four vocabulary items.  To avoid confusion use more precise terminology: 1. Word token: an occurrence of a word 2. Word Type: a vocabulary item
  • 16. Examples on python shell  Tokenization  Sentence Detection  Common Usages, etc.
  • 17. References 1. CS1573: AI Application Development, Spring 2003 (modified from Edward Loper’s notes) 2. nltk.sourceforge.net/tutorial/introduction/index.html 3. Applied Natural Language Processing, Fall 2009, by Barbara Rosario
  • 18. Thank You for your patient listening! Contact : animenon@mail.com