Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
Department of Information Technology
Prepared by
Mr. Umesh B. Sangule
Assistant Professor
Department of Information Technology
De
partme
ntof InformationTec
hnology, SRES’sSanjivani Colle
geof Engine
e
ring, Kopargaon-423603
Natural Language Processing(NLP)
(IT401)
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Unit-V
NLP Tools and Techniques
Course Objectives : To apply various NLP tools and techniques,
Course Outcome(CO5) : Apply various NLP tools and techniques
Content
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Prominent NLP Libraries
CoreNLP
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
 The Natural Language Toolkit (NLTK) is a platform used for building
Python programs that work with human language data for applying in
statistical Natural Language Processing (NLP).
 It contains text processing libraries for tokenization, parsing,
classification, stemming, tagging and semantic reasoning.
 NLTK defines an infrastructure that can be used to build NLP programs
in Python,
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
 It provides basic classes for representing data relevant to natural language
processing;
 Standard interfaces for performing tasks such as part- of-speech tagging,
syntactic parsing, and text classification; standard implementations for
each task that can be combined to solve complex problems.
 NLTK was originally created in 2001 as part of a computational linguistics
course in the Department of Computer and Information Science at the
University of Pennsylvania.
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
 Installing NLTK:
Use the pip install method to install NLTK in your system:
“ pip install nltk “
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Named Entity Recognition:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Named Entity Recognition:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
 NLP is a subfield of artificial intelligence and it is all about allowing
computers to comprehend human language. NLP involves analyzing,
quantifying, understanding and deriving meaning from natural languages.
 Note: Currently, the most powerful NLP models are transformer based. BERT
from Google and the GPT family from OpenAI are examples of such models.
 Since the release of version 3.0, spaCy supports transformer based models.
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
 spacy is a free, open-source library for NLP in Python written in Cython.
spacy is designed to make it easy to build systems for information extraction
or general- purpose natural language processing.
 It provides all the features required for natural language processing. It
provides production-ready code. It is very popular and widely used. It
contains processing pipelines and language-specific rules for tokenization.
 In the last five years, spaCy has become an industry standard with a huge
ecosystem.
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
 Gensim is a well-known open-source Python library used in NLP and Topic
Modeling. Its ability to handle vast quantities of text data and its speed in
training vector embeddings set it apart from the other NLP libraries.
 Moreover, Gensim provides popular topic modelling algorithms such as LDA,
making it the go-to library for many users.
 It is designed to handle large text collections using data streaming and
incremental online algorithms,
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
 Gensim is not an all-encompassing NLP research library (like NLTK); rather,
it is a mature, targeted, and efficient collection of NLP tools for subject
modelling.
 It also includes tools for loading pre-trained word embeddings in a variety of
formats, as well as using and querying a loaded embedding.
 Using its incremental online training algorithms, Gensim can easily process
massive and web-scale corpora.
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Gensim:
Prominent NLP Libraries
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
 spaCy is a free and open-source library for Natural Language Processing in
Python with a lot of in-built capabilities.
 The popularity of spaCy is growing steadily as the factors that work in its
favor of spaCy are the set of features it offers, the ease of use, and the fact that
the library is always kept up to date.
 The process of applying statistical analysis to a dataset is called statistical
modeling. A statistical model is a mathematical representation of observed
data.
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
 spaCy's statistical models are the power engines of spaCy. These models help
spaCy to perform several NLP-related tasks, such as part-of-speech tagging,
named entity recognition, and dependency parsing.
 List of Statistical “en” Models in spaCy:
1) en_core_web_sm: English multi-task CNN trained on OntoNotes.
2) en_core_web_md: English multi-task CNN trained on OntoNotes, with
GloVe vectors trained on Common Crawl.
3) en_core_web_lg: English multi-task CNN trained on OntoNotes, with
GloVe vectors trained on Common Crawl.
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
 We import the spaCy models using spacy.load(‘model_name’).
 To use spaCy for your model, follow the steps below:-
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
 Analyzing text data using Stanford’s CoreNLP makes text data analysis easy
and efficient. With just a few lines of code, CoreNLP allows for the
extraction of all kinds of text properties, such as named-entity recognition or
part-of-speech tagging.
 CoreNLP is written in Java and requires Java to be installed on your device
but offers programming interfaces for several popular programming
languages, including Python,
 It supports four languages other than English: Arabic, Chinese, German,
French, and Spanish.
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
 When the download is complete, all that’s left is unzipping the file with the
following commands:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
 After having finished installing CoreNLP, we can finally start analyzing text
data in Python. First, let’s import py-corenlp and initialize CoreNLP.
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Use cases of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Use cases of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
DEPARTMENT OF INFORMATION TECHNOLOGY, SCOE,KOPARGAON

Natural_Language_processing_Unit_5_notes

  • 1.
    Sanjivani Rural EducationSociety’s Sanjivani College of Engineering, Kopargaon-423 603 Department of Information Technology Prepared by Mr. Umesh B. Sangule Assistant Professor Department of Information Technology De partme ntof InformationTec hnology, SRES’sSanjivani Colle geof Engine e ring, Kopargaon-423603 Natural Language Processing(NLP) (IT401)
  • 2.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Unit-V NLP Tools and Techniques Course Objectives : To apply various NLP tools and techniques, Course Outcome(CO5) : Apply various NLP tools and techniques
  • 3.
    Content DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Prominent NLP Libraries CoreNLP
  • 4.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK):  The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical Natural Language Processing (NLP).  It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.  NLTK defines an infrastructure that can be used to build NLP programs in Python, Prominent NLP Libraries
  • 5.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK):  It provides basic classes for representing data relevant to natural language processing;  Standard interfaces for performing tasks such as part- of-speech tagging, syntactic parsing, and text classification; standard implementations for each task that can be combined to solve complex problems.  NLTK was originally created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania. Prominent NLP Libraries
  • 6.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK):  Installing NLTK: Use the pip install method to install NLTK in your system: “ pip install nltk “ Prominent NLP Libraries
  • 7.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Prominent NLP Libraries
  • 8.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Prominent NLP Libraries
  • 9.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Prominent NLP Libraries
  • 10.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Prominent NLP Libraries
  • 11.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Named Entity Recognition: Prominent NLP Libraries
  • 12.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Natural Language Tool Kit (NLTK): Named Entity Recognition: Prominent NLP Libraries
  • 13.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy:  NLP is a subfield of artificial intelligence and it is all about allowing computers to comprehend human language. NLP involves analyzing, quantifying, understanding and deriving meaning from natural languages.  Note: Currently, the most powerful NLP models are transformer based. BERT from Google and the GPT family from OpenAI are examples of such models.  Since the release of version 3.0, spaCy supports transformer based models. Prominent NLP Libraries
  • 14.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy:  spacy is a free, open-source library for NLP in Python written in Cython. spacy is designed to make it easy to build systems for information extraction or general- purpose natural language processing.  It provides all the features required for natural language processing. It provides production-ready code. It is very popular and widely used. It contains processing pipelines and language-specific rules for tokenization.  In the last five years, spaCy has become an industry standard with a huge ecosystem. Prominent NLP Libraries
  • 15.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 16.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 17.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 18.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 19.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 20.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 21.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 22.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 23.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy: Prominent NLP Libraries
  • 24.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim:  Gensim is a well-known open-source Python library used in NLP and Topic Modeling. Its ability to handle vast quantities of text data and its speed in training vector embeddings set it apart from the other NLP libraries.  Moreover, Gensim provides popular topic modelling algorithms such as LDA, making it the go-to library for many users.  It is designed to handle large text collections using data streaming and incremental online algorithms, Prominent NLP Libraries
  • 25.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim:  Gensim is not an all-encompassing NLP research library (like NLTK); rather, it is a mature, targeted, and efficient collection of NLP tools for subject modelling.  It also includes tools for loading pre-trained word embeddings in a variety of formats, as well as using and querying a loaded embedding.  Using its incremental online training algorithms, Gensim can easily process massive and web-scale corpora. Prominent NLP Libraries
  • 26.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 27.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 28.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 29.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 30.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 31.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 32.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 33.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Gensim: Prominent NLP Libraries
  • 34.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model:  spaCy is a free and open-source library for Natural Language Processing in Python with a lot of in-built capabilities.  The popularity of spaCy is growing steadily as the factors that work in its favor of spaCy are the set of features it offers, the ease of use, and the fact that the library is always kept up to date.  The process of applying statistical analysis to a dataset is called statistical modeling. A statistical model is a mathematical representation of observed data. Language model using Spacy library for English language,
  • 35.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model:  spaCy's statistical models are the power engines of spaCy. These models help spaCy to perform several NLP-related tasks, such as part-of-speech tagging, named entity recognition, and dependency parsing.  List of Statistical “en” Models in spaCy: 1) en_core_web_sm: English multi-task CNN trained on OntoNotes. 2) en_core_web_md: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. 3) en_core_web_lg: English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Language model using Spacy library for English language,
  • 36.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model:  We import the spaCy models using spacy.load(‘model_name’).  To use spaCy for your model, follow the steps below:- Language model using Spacy library for English language,
  • 37.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model: Language model using Spacy library for English language,
  • 38.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model: Language model using Spacy library for English language,
  • 39.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Spacy “English” Language Model: Language model using Spacy library for English language,
  • 40.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Stanford CoreNLP:  Analyzing text data using Stanford’s CoreNLP makes text data analysis easy and efficient. With just a few lines of code, CoreNLP allows for the extraction of all kinds of text properties, such as named-entity recognition or part-of-speech tagging.  CoreNLP is written in Java and requires Java to be installed on your device but offers programming interfaces for several popular programming languages, including Python,  It supports four languages other than English: Arabic, Chinese, German, French, and Spanish. CoreNLP: Stanford CoreNLP and its features,
  • 41.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 42.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Stanford CoreNLP:  When the download is complete, all that’s left is unzipping the file with the following commands: CoreNLP: Stanford CoreNLP and its features,
  • 43.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 44.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Stanford CoreNLP:  After having finished installing CoreNLP, we can finally start analyzing text data in Python. First, let’s import py-corenlp and initialize CoreNLP. CoreNLP: Stanford CoreNLP and its features,
  • 45.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Use cases of Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 46.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Use cases of Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 47.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Feature of Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 48.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Feature of Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 49.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON Feature of Stanford CoreNLP: CoreNLP: Stanford CoreNLP and its features,
  • 50.
    DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON