Sanjivani Rural EducationSociety’s
Sanjivani College of Engineering, Kopargaon-423 603
Department of Information Technology
Prepared by
Mr. Umesh B. Sangule
Assistant Professor
Department of Information Technology
De
partme
ntof InformationTec
hnology, SRES’sSanjivani Colle
geof Engine
e
ring, Kopargaon-423603
Natural Language Processing(NLP)
(IT401)
2.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Unit-V
NLP Tools and Techniques
Course Objectives : To apply various NLP tools and techniques,
Course Outcome(CO5) : Apply various NLP tools and techniques
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
The Natural Language Toolkit (NLTK) is a platform used for building
Python programs that work with human language data for applying in
statistical Natural Language Processing (NLP).
It contains text processing libraries for tokenization, parsing,
classification, stemming, tagging and semantic reasoning.
NLTK defines an infrastructure that can be used to build NLP programs
in Python,
Prominent NLP Libraries
5.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
It provides basic classes for representing data relevant to natural language
processing;
Standard interfaces for performing tasks such as part- of-speech tagging,
syntactic parsing, and text classification; standard implementations for
each task that can be combined to solve complex problems.
NLTK was originally created in 2001 as part of a computational linguistics
course in the Department of Computer and Information Science at the
University of Pennsylvania.
Prominent NLP Libraries
6.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Installing NLTK:
Use the pip install method to install NLTK in your system:
“ pip install nltk “
Prominent NLP Libraries
7.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
8.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
9.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
10.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Prominent NLP Libraries
11.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Named Entity Recognition:
Prominent NLP Libraries
12.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Natural Language Tool Kit (NLTK):
Named Entity Recognition:
Prominent NLP Libraries
13.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy:
NLP is a subfield of artificial intelligence and it is all about allowing
computers to comprehend human language. NLP involves analyzing,
quantifying, understanding and deriving meaning from natural languages.
Note: Currently, the most powerful NLP models are transformer based. BERT
from Google and the GPT family from OpenAI are examples of such models.
Since the release of version 3.0, spaCy supports transformer based models.
Prominent NLP Libraries
14.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy:
spacy is a free, open-source library for NLP in Python written in Cython.
spacy is designed to make it easy to build systems for information extraction
or general- purpose natural language processing.
It provides all the features required for natural language processing. It
provides production-ready code. It is very popular and widely used. It
contains processing pipelines and language-specific rules for tokenization.
In the last five years, spaCy has become an industry standard with a huge
ecosystem.
Prominent NLP Libraries
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Gensim:
Gensim is a well-known open-source Python library used in NLP and Topic
Modeling. Its ability to handle vast quantities of text data and its speed in
training vector embeddings set it apart from the other NLP libraries.
Moreover, Gensim provides popular topic modelling algorithms such as LDA,
making it the go-to library for many users.
It is designed to handle large text collections using data streaming and
incremental online algorithms,
Prominent NLP Libraries
25.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Gensim:
Gensim is not an all-encompassing NLP research library (like NLTK); rather,
it is a mature, targeted, and efficient collection of NLP tools for subject
modelling.
It also includes tools for loading pre-trained word embeddings in a variety of
formats, as well as using and querying a loaded embedding.
Using its incremental online training algorithms, Gensim can easily process
massive and web-scale corpora.
Prominent NLP Libraries
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
spaCy is a free and open-source library for Natural Language Processing in
Python with a lot of in-built capabilities.
The popularity of spaCy is growing steadily as the factors that work in its
favor of spaCy are the set of features it offers, the ease of use, and the fact that
the library is always kept up to date.
The process of applying statistical analysis to a dataset is called statistical
modeling. A statistical model is a mathematical representation of observed
data.
Language model using Spacy library for English language,
35.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
spaCy's statistical models are the power engines of spaCy. These models help
spaCy to perform several NLP-related tasks, such as part-of-speech tagging,
named entity recognition, and dependency parsing.
List of Statistical “en” Models in spaCy:
1) en_core_web_sm: English multi-task CNN trained on OntoNotes.
2) en_core_web_md: English multi-task CNN trained on OntoNotes, with
GloVe vectors trained on Common Crawl.
3) en_core_web_lg: English multi-task CNN trained on OntoNotes, with
GloVe vectors trained on Common Crawl.
Language model using Spacy library for English language,
36.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
We import the spaCy models using spacy.load(‘model_name’).
To use spaCy for your model, follow the steps below:-
Language model using Spacy library for English language,
37.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
38.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
39.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Spacy “English” Language Model:
Language model using Spacy library for English language,
40.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
Analyzing text data using Stanford’s CoreNLP makes text data analysis easy
and efficient. With just a few lines of code, CoreNLP allows for the
extraction of all kinds of text properties, such as named-entity recognition or
part-of-speech tagging.
CoreNLP is written in Java and requires Java to be installed on your device
but offers programming interfaces for several popular programming
languages, including Python,
It supports four languages other than English: Arabic, Chinese, German,
French, and Spanish.
CoreNLP: Stanford CoreNLP and its features,
41.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
42.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
When the download is complete, all that’s left is unzipping the file with the
following commands:
CoreNLP: Stanford CoreNLP and its features,
43.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
44.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Stanford CoreNLP:
After having finished installing CoreNLP, we can finally start analyzing text
data in Python. First, let’s import py-corenlp and initialize CoreNLP.
CoreNLP: Stanford CoreNLP and its features,
45.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Use cases of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
46.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Use cases of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
47.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
48.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,
49.
DEPARTMENT OF INFORMATIONTECHNOLOGY, SCOE,KOPARGAON
Feature of Stanford CoreNLP:
CoreNLP: Stanford CoreNLP and its features,