SlideShare a Scribd company logo
1 of 47
MR. JAYANAND KAMBLE
WELCOME TO JAK’S TUTORIAL
Prof. Jayanand Kamble
1
CONTENTS:
• Definition
• Issues and strategies
• Application domain
• Tools for NLP
• Linguistic organization of NLP
• NLP vs PLP
Prof. Jayanand Kamble
2
OVERVIEW
• Human Evolution
• How did Stone Age communicate?
• Early humans could express thoughts and feelings by means of speech or
by signs or gestures. They could signal with fire and smoke, drums, or
whistles.
• Why language?
• Language helps us express our feelings and thoughts — this is unique to
our species because it is a way to express unique ideas and customs
within different cultures and societies.
Prof. Jayanand Kamble
3
• What are the world's spoken languages?
• Around 7000 languages are spoken in the world today.
Prof. Jayanand Kamble
4
INDIAN CONTEXT
• India is a multi-lingual country with great linguistic and cultural
diversities
• 22 official languages mentioned in the Indian constitution
• However, Census of India in 2001 reported-
• 122 major languages
• 1,599 other regional languages
Prof. Jayanand Kamble
5
• 2,371 scripts
• 30 languages are spoken by more than one million native
speakers
• 122 are spoken by more than 10,000 people
• 20% understand English
• 80% cannot understand
Prof. Jayanand Kamble
6
• Some languages are ranked within 20 in in the world in
terms of the populations speaking them.
• Hindi : 4th (~350 million)
• Bangla: 5th (~230 million)
• Marathi 10th (~84 million)
Prof. Jayanand Kamble
7
Prof. Jayanand Kamble
8
DEFINITION: NLP
• Humans communicate through some form of language either by text or speech.
• To make interactions between computers and humans, computers need to understand
natural languages used by humans.
• Natural language processing is all about making computers learn, understand,
analyze, manipulate and interpret natural(human) languages.
• NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence.
• Natural Language Processing (NLP) is the capacity of a computer to "understand"
natural language text at a level that allows meaningful interaction between the
computer and a person working in a particular application domain.
Prof. Jayanand Kamble
9
• Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your instructions, when
you want to hear decision from a dialogue based clinical expert system,
etc.
• The ability of machines to interpret human language is now at the core
of many applications that we use every day - chatbots, Email
classification and spam filters, search engines, grammar checkers, voice
assistants, and social language translators.
• The input and output of an NLP system can be Speech or Written Text
Prof. Jayanand Kamble
10
COMPONENTS OF NLP
• There are two components of NLP, Natural Language
Understanding (NLU) and Natural Language Generation (NLG).
• Natural Language Understanding (NLU) which involves
transforming human language into a machine-readable format.
• It helps the machine to understand and analyze human language
by extracting the text from large data such as keywords,
emotions, relations, and semantics.
Prof. Jayanand Kamble
11
• Natural Language Generation (NLG) acts as a translator that
converts the computerized data into natural language
representation.
• It mainly involves Text planning, Sentence planning, and Text
realization.
• The NLU is harder than NLG.
Prof. Jayanand Kamble
12
APPLICATION DOMAIN
• text processing - word processing, e-mail, spelling and grammar
checkers
• interfaces to data bases - query languages, information retrieval,
data mining, text summarization
• expert systems - explanations, disease diagnosis
• linguistics - machine translation, content analysis, writers'
assistants, language generation
Prof. Jayanand Kamble
13
APPLICATION DOMAIN EXAMPLES
• Search Autocorrect and Autocomplete:
• Language Translator:
• Social Media Monitoring: More and more people these days have started using
social media for posting their thoughts about a particular product, policy, or matter.
• Chatbots: Customer service and experience is the most important thing for any
company.
• Survey Analysis: Surveys are an important way of evaluating a company’s
performance. Companies conduct many surveys to get customer’s feedback on
various products.
• Targeted Advertising:
Prof. Jayanand Kamble
14
• Hiring and Recruitment: The Human Resource department is an integral part
of every company. They have the most important job of selecting the right
employees for a company.
• Voice Assistants: I am sure you’ve already met them, Google Assistant, Apple
Siri, Amazon Alexa, ring a bell? Yes, all of these are voice assistants.
• Grammar Checkers: This is one of the most widely used applications of
natural language processing.
• Email Filtering: Have you ever used Gmail?
Prof. Jayanand Kamble
15
• Sentiment Analysis: Natural language understanding is particularly difficult for
machines when it comes to opinions, given that humans often use sarcasm and
irony.
• Text Classification: Text classification, a text analysis task that also includes
sentiment analysis, involves automatically understanding, processing, and
categorizing unstructured text.
• Text Extraction: Text extraction, or information extraction, automatically detects
specific information in a text, such as names, companies, places, and more. This is
also known as named entity recognition.
• Machine Translation: Machine translation (MT) is one of the first applications of
natural language processing.
Prof. Jayanand Kamble
16
• Text Summarization: There are two ways of using natural language processing to
summarize data: extraction-based summarization ‒ which extracts key phrases
and creates a summary, without adding any extra information ‒ and abstraction-
based summarization, which creates new phrases paraphrasing the original
source.
• Market Intelligence: Marketers can benefit from natural language processing to
learn more about their customers and use those insights to create more effective
strategies.
• Intent Classification: Intent classification consists of identifying the goal or
purpose that underlies a text.
Prof. Jayanand Kamble
17
TOOLS FOR NLP
• Programming languages and software - Prolog , ALE , Lisp/Scheme,
C/C++, python, java
• Statistical Methods - Markov models, probabilistic grammars, text-
based analysis
• Abstract Models - Context-free grammars (BNF), Attribute grammars,
Predicate calculus and other semantic models, Knowledge-based and
ontological methods
Prof. Jayanand Kamble
18
TOOLS FOR NLP EXAMPLE
• Natural Language Processing tools are helping companies get
insights from unstructured text data like emails, online reviews,
social media posts, and more.
• There are many online tools that make NLP accessible to your
business, like
• open-source and
• SaaS.
Prof. Jayanand Kamble
19
• Open-source:
• Open-source libraries are free, flexible, and allow developers to fully
customize them.
• However, they’re not cost-effective and you’ll need to spend time
building and training open-source tools before you can reap the
benefits.
• SaaS tools
• SaaS tools are a great alternative if you don’t want to invest a lot of
time building complex infrastructures or spend money on extra
resources.
Prof. Jayanand Kamble
20
• MonkeyLearn:
• Is a user-friendly, NLP-powered platform that helps to gain valuable insights
from text data.
• To get started, the pre-trained models can be used to perform text analysis
tasks such as sentiment analysis, topic classification, or keyword extraction.
• Aylien:
• Is a SaaS API that uses deep learning and NLP to analyze large volumes of text-
based data, such as academic publications, real-time content from news outlets
and social media data.
• It can be used for NLP tasks like text summarization, article extraction, entity
extraction, and sentiment analysis, among others.
Prof. Jayanand Kamble
21
• IBM Watson
• IBM Watson is a suite of AI services stored in the IBM Cloud.
• One of its key features is Natural Language Understanding, which allows
to identify and extract keywords, categories, emotions, entities, and more.
• Amazon Comprehend:
• Amazon Comprehend is an NLP service, integrated with the Amazon
Web Services infrastructure.
• This API can be used for NLP tasks such as sentiment analysis, topic
modeling, entity recognition, and more.
Prof. Jayanand Kamble
22
• Google Cloud
• The Google Cloud Natural Language API provides several pre-
trained models for sentiment analysis, content classification, and
entity extraction, among others.
• It offers AutoML Natural Language, which allow to build
customized machine learning models.
• As part of the Google Cloud infrastructure, it uses Google
question-answering and language understanding technology.
Prof. Jayanand Kamble
23
• NLTK
• The Natural Language Toolkit (NLTK) with Python is one of
the leading tools in NLP model building.
• Focused on research and education in the NLP field, NLTK is
bolstered by an active community, as well as a range of
tutorials for language processing, sample datasets, and
resources that include a comprehensive Language
Processing and Python handbook.
Prof. Jayanand Kamble
24
• Stanford Core NLP
• Stanford Core NLP is a popular library built and maintained by the NLP
community at Stanford University.
• It’s written in Java ‒ need to install JDK on computer ‒ but it has APIs in
most programming languages.
• The Core NLP toolkit allows to perform a variety of NLP tasks, such as part-
of-speech tagging, tokenization, or named entity recognition.
• Some of its main advantages include scalability and optimization for
speed, making it a good choice for complex tasks.
Prof. Jayanand Kamble
25
• TextBlob
• TextBlob is a Python library that works as an extension of
NLTK, allowing to perform the same NLP tasks in a much
more intuitive and user-friendly interface.
• Its learning curve is more simple than with other open-
source libraries, so it’s an excellent choice for beginners, who
want to tackle NLP tasks like sentiment analysis, text
classification, part-of-speech tagging, and more.
Prof. Jayanand Kamble
26
• SpaCy
• One of the newest open-source Natural Language Processing with
Python libraries is SpaCy.
• It’s lightning-fast, easy to use, well-documented, and designed to
support large volumes of data, not to mention, boasts a series of
pretrained NLP models that make job even easier.
• Unlike NLTK or CoreNLP, which display a number of algorithms for
each task, SpaCy keeps its menu short and serves up the best available
option for each task at hand.
Prof. Jayanand Kamble
27
• GenSim
• Gensim is a highly specialized Python library that largely deals
with topic modeling tasks using algorithms like Latent Dirichlet
Allocation (LDA).
• It’s also excellent at recognizing text similarities, indexing texts,
and navigating different documents.
• This library is fast, scalable, and good at handling large volumes of
data.
Prof. Jayanand Kamble
28
NATURAL LANGUAGES VS. COMPUTER LANGUAGES
• Ambiguity is the primary difference between natural and computer
languages.
• Formal programming languages are designed to be unambiguous, i.e.
they can be defined by a grammar that produces a unique parse for
each sentence in the language.
• Programming languages are also designed for efficient parsing, i.e.
they are deterministic context free languages
• A sentence in a DCFL can be parsed in O( n ) time where n is the length of the
string.
Prof. Jayanand Kamble
29
LINGUISTIC ORGANIZATION OF NLP
• Phonetics and phonology
• Morphology
• Lexical Analysis
• Syntactic Analysis
• Semantic Analysis
• Pragmatics
• Discourse
Prof. Jayanand Kamble
30
NLP architecture and stages of
processing ambiguity at every
stage
ध्वनीशास्त्र and उच्चारशास्त्र
मॉर्फोलॉजी
शाब्दिक विश्लेषण
िाक्यरचना विश्लेषण
विमेंविक विश्लेषण
व्यािहाररकता
प्रिचन
PHONETICS
• Processing of speech
• Challenges
• Homophones: bank (finance) vs. bank (river bank)
• Near Homophones: maatraa vs. maatra (Hin)
• Word Boundary: aajaayenge (aa jaayenge (will come) or aaj aayenge (will
come today)
• Phrase boundary
• PhD students are especially exhorted to attend as such seminars are integral to one's
post graduate education
• Disfluency: ah, um, ahem etc.
• The best part of my job is … well … the best part of my job is the responsibility.
Prof. Jayanand Kamble
31 ध्वनीशास्त्र
WORD SEGMENTATION
• Breaking a string of characters (graphemes) into a sequence of words.
• In some written languages (e. g. Chinese) words are not separated by
spaces.
• Even in English, characters other than white space can be used to
separate words [e. g. , ; . - :()]
• Examples from English URLs:
• Jumptheshark.com  jump the shark com
• myspace.com/pluckerswingbar 
• myspace com pluckers wing bar
• myspace com plucker swing bar
Prof. Jayanand Kamble
32
MORPHOLOGICAL ANALYSIS
• Morphology is the field of linguistics that studies the internal structure of
words.
• A morpheme is the smallest linguistic unit that has semantic meaning.
• e. g. in, come, -ing, forming incoming )
• Morphological analysis is the task of segmenting a word into its
morphemes
• Carried  carry + ed (past tense)
• Independently  in+(depend+ent)+ly
• Googlers  (Google+er)+s (plural)
• Unlockable  un+(lock+able)?
Prof. Jayanand Kamble
33 unit that cannot be further divided
MORPHOLOGY
• Collection of “Word formation rules from root words”
• Nouns: Plural(boy-boys); Gender marking (Ravi-Ravina)
• Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit 
had sat); Modality (e.g. request khaanaa  khaaiie)
• Crucial first step in NLP
Prof. Jayanand Kamble
34
पद्धत
• Languages rich in morphology e. g. Dravidian, Hungarian, Turkish, Indian
languages.
• Languages poor in morphology Chinese, English.
• Languages with rich morphology have the advantage of easier
processing at higher stages of processing.
• A task of interest to computer science: creating Finite State
Word Morphology.
Prof. Jayanand Kamble
35
LEXICAL ANALYSIS
• Essentially refers to dictionary access and obtaining
the properties of the word.
• Challenge:
• Lexical or word sense
disambiguation
Prof. Jayanand Kamble
36
e. g. dog
noun (lexical property)
take 's' in plural (morph
property)
animate (semantic property)
4-legged ( do )
carnivore ( do )
शाब्दिक विश्लेषण
LEXICAL DISAMBIGUATION
• First step: Part of Speech Disambiguation
• Dog as a noun (Animal)
• Dog as a verb (to pursue or to go after or to follow somebody
closely.)
• Ex. A shadowy figure was dogging their every move.
• Sense Disambiguation:
• Dog ( as animal)
• Dog ( as a very detestable person)
Prof. Jayanand Kamble
37
• Needs word relationships in a context:
• The chair emphasized the need for adult education.
• Very common in day to day communications:
“Satellite Channel Ad: Watch what you want, when you
want”
(two senses of watch)
• Watch: wristwatch / watching something
Prof. Jayanand Kamble
38
TECHNOLOGICAL DEVELOPMENTS BRING IN NEW TERMS,
ADDITIONAL MEANINGS/NUANCES FOR EXISTING TERMS
• Justify as in justify the right margin (word processing context)
• Xeroxed : a new verb
• Digital Trace : a new expression
• Communifaking : pretending to talk on mobile when you are actually
not
• Discomgooglation : anxiety/discomfort at not being able to access
internet
• Helicopter Parenting : over parenting
Prof. Jayanand Kamble
39
AMBIGUITY OF MULTI-WORDS
• The grandfather kicked the bucket after suffering from
cancer.
• This job is a piece of cake.
• He is the dark horse of the match.
Prof. Jayanand Kamble
40
to die
खूप िोपी अिलेली गोष्ट; हातचा मळ
someone who has a surprising ability or
skill
AMBIGUITY OF MULTI-WORDS: GOOGLE
TRANSLATION
Prof. Jayanand Kamble
41
आजोबाांनी क
ॅ न्सरने ग्रािल्यानांतर बादलीला लाथ मारली.
हे काम क
े कचा तुकडा आहे.
तो िामन्याचा डाक
क हॉिक आहे.
SYNTACTIC ANALYSIS
Prof. Jayanand Kamble
42
िाक्यरचना विश्लेषण
1. PART OF SPEECH (POS) TAGGING
• Annotate each word in a sentence with a PoS
• Useful for subsequent syntactic parsing and word
sense disambiguation
Prof. Jayanand Kamble
43
I ate the spaghetti with meatballs.
Pro V Det N Prep N
2. PHRASE CHUNKING
• Phrase chunking is a phase of natural language
processing that separates and segments a
sentence into its sub-constituents, such as
noun, verb, and prepositional phrases,
abbreviated as NP, VP, and PP, respectively.
Prof. Jayanand Kamble
44
• Find all non recursive noun phrases (NPs) and verb
phrases (VPs) in a sentence
Prof. Jayanand Kamble
45
[NP I] [VP ate] [NP the spaghetti] [PP with] [NP
meatballs].
[NP He ] [VP reckons] [NP the current account deficit
] [VP will narrow] [PP to ] [NP only # 1.8 billion ] [PP
in ] [NP September ]
Prof. Jayanand Kamble
46
NLP VS PLP
NLP PLP
domain of discourse broad: what can be expressed
narrow: what can be
computed
lexicon large/complex small/simple
grammatical
constructs
many and varied
- declarative
- interrogative
- fragments
etc.
few
- declarative
- imperative
meanings of an
expression
many one
tools and techniques
morphological analysis
syntactic analysis
semantic analysis
integration of world knowledge
lexical analysis
context-free parsing
code generation/compiling
interpreting
Prof. Jayanand Kamble
47

More Related Content

Similar to Introduction to NLP.pptx

NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptxAmanBadesra1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
NLP presentation.pptx
NLP presentation.pptxNLP presentation.pptx
NLP presentation.pptxpysgpa
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
naturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfnaturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfshakeelAsghar6
 
Natural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxNatural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxSHIBDASDUTTA
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
An Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxAn Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxSoftxai
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 

Similar to Introduction to NLP.pptx (20)

NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
sample PPT.pptx
sample PPT.pptxsample PPT.pptx
sample PPT.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
NLP presentation.pptx
NLP presentation.pptxNLP presentation.pptx
NLP presentation.pptx
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
naturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfnaturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdf
 
Natural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxNatural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptx
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
WomenTech_Event
WomenTech_EventWomenTech_Event
WomenTech_Event
 
An Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxAn Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptx
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Introduction to NLP.pptx

  • 1. MR. JAYANAND KAMBLE WELCOME TO JAK’S TUTORIAL Prof. Jayanand Kamble 1
  • 2. CONTENTS: • Definition • Issues and strategies • Application domain • Tools for NLP • Linguistic organization of NLP • NLP vs PLP Prof. Jayanand Kamble 2
  • 3. OVERVIEW • Human Evolution • How did Stone Age communicate? • Early humans could express thoughts and feelings by means of speech or by signs or gestures. They could signal with fire and smoke, drums, or whistles. • Why language? • Language helps us express our feelings and thoughts — this is unique to our species because it is a way to express unique ideas and customs within different cultures and societies. Prof. Jayanand Kamble 3
  • 4. • What are the world's spoken languages? • Around 7000 languages are spoken in the world today. Prof. Jayanand Kamble 4
  • 5. INDIAN CONTEXT • India is a multi-lingual country with great linguistic and cultural diversities • 22 official languages mentioned in the Indian constitution • However, Census of India in 2001 reported- • 122 major languages • 1,599 other regional languages Prof. Jayanand Kamble 5
  • 6. • 2,371 scripts • 30 languages are spoken by more than one million native speakers • 122 are spoken by more than 10,000 people • 20% understand English • 80% cannot understand Prof. Jayanand Kamble 6
  • 7. • Some languages are ranked within 20 in in the world in terms of the populations speaking them. • Hindi : 4th (~350 million) • Bangla: 5th (~230 million) • Marathi 10th (~84 million) Prof. Jayanand Kamble 7
  • 9. DEFINITION: NLP • Humans communicate through some form of language either by text or speech. • To make interactions between computers and humans, computers need to understand natural languages used by humans. • Natural language processing is all about making computers learn, understand, analyze, manipulate and interpret natural(human) languages. • NLP stands for Natural Language Processing, which is a part of Computer Science, Human language, and Artificial Intelligence. • Natural Language Processing (NLP) is the capacity of a computer to "understand" natural language text at a level that allows meaningful interaction between the computer and a person working in a particular application domain. Prof. Jayanand Kamble 9
  • 10. • Processing of Natural Language is required when you want an intelligent system like robot to perform as per your instructions, when you want to hear decision from a dialogue based clinical expert system, etc. • The ability of machines to interpret human language is now at the core of many applications that we use every day - chatbots, Email classification and spam filters, search engines, grammar checkers, voice assistants, and social language translators. • The input and output of an NLP system can be Speech or Written Text Prof. Jayanand Kamble 10
  • 11. COMPONENTS OF NLP • There are two components of NLP, Natural Language Understanding (NLU) and Natural Language Generation (NLG). • Natural Language Understanding (NLU) which involves transforming human language into a machine-readable format. • It helps the machine to understand and analyze human language by extracting the text from large data such as keywords, emotions, relations, and semantics. Prof. Jayanand Kamble 11
  • 12. • Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural language representation. • It mainly involves Text planning, Sentence planning, and Text realization. • The NLU is harder than NLG. Prof. Jayanand Kamble 12
  • 13. APPLICATION DOMAIN • text processing - word processing, e-mail, spelling and grammar checkers • interfaces to data bases - query languages, information retrieval, data mining, text summarization • expert systems - explanations, disease diagnosis • linguistics - machine translation, content analysis, writers' assistants, language generation Prof. Jayanand Kamble 13
  • 14. APPLICATION DOMAIN EXAMPLES • Search Autocorrect and Autocomplete: • Language Translator: • Social Media Monitoring: More and more people these days have started using social media for posting their thoughts about a particular product, policy, or matter. • Chatbots: Customer service and experience is the most important thing for any company. • Survey Analysis: Surveys are an important way of evaluating a company’s performance. Companies conduct many surveys to get customer’s feedback on various products. • Targeted Advertising: Prof. Jayanand Kamble 14
  • 15. • Hiring and Recruitment: The Human Resource department is an integral part of every company. They have the most important job of selecting the right employees for a company. • Voice Assistants: I am sure you’ve already met them, Google Assistant, Apple Siri, Amazon Alexa, ring a bell? Yes, all of these are voice assistants. • Grammar Checkers: This is one of the most widely used applications of natural language processing. • Email Filtering: Have you ever used Gmail? Prof. Jayanand Kamble 15
  • 16. • Sentiment Analysis: Natural language understanding is particularly difficult for machines when it comes to opinions, given that humans often use sarcasm and irony. • Text Classification: Text classification, a text analysis task that also includes sentiment analysis, involves automatically understanding, processing, and categorizing unstructured text. • Text Extraction: Text extraction, or information extraction, automatically detects specific information in a text, such as names, companies, places, and more. This is also known as named entity recognition. • Machine Translation: Machine translation (MT) is one of the first applications of natural language processing. Prof. Jayanand Kamble 16
  • 17. • Text Summarization: There are two ways of using natural language processing to summarize data: extraction-based summarization ‒ which extracts key phrases and creates a summary, without adding any extra information ‒ and abstraction- based summarization, which creates new phrases paraphrasing the original source. • Market Intelligence: Marketers can benefit from natural language processing to learn more about their customers and use those insights to create more effective strategies. • Intent Classification: Intent classification consists of identifying the goal or purpose that underlies a text. Prof. Jayanand Kamble 17
  • 18. TOOLS FOR NLP • Programming languages and software - Prolog , ALE , Lisp/Scheme, C/C++, python, java • Statistical Methods - Markov models, probabilistic grammars, text- based analysis • Abstract Models - Context-free grammars (BNF), Attribute grammars, Predicate calculus and other semantic models, Knowledge-based and ontological methods Prof. Jayanand Kamble 18
  • 19. TOOLS FOR NLP EXAMPLE • Natural Language Processing tools are helping companies get insights from unstructured text data like emails, online reviews, social media posts, and more. • There are many online tools that make NLP accessible to your business, like • open-source and • SaaS. Prof. Jayanand Kamble 19
  • 20. • Open-source: • Open-source libraries are free, flexible, and allow developers to fully customize them. • However, they’re not cost-effective and you’ll need to spend time building and training open-source tools before you can reap the benefits. • SaaS tools • SaaS tools are a great alternative if you don’t want to invest a lot of time building complex infrastructures or spend money on extra resources. Prof. Jayanand Kamble 20
  • 21. • MonkeyLearn: • Is a user-friendly, NLP-powered platform that helps to gain valuable insights from text data. • To get started, the pre-trained models can be used to perform text analysis tasks such as sentiment analysis, topic classification, or keyword extraction. • Aylien: • Is a SaaS API that uses deep learning and NLP to analyze large volumes of text- based data, such as academic publications, real-time content from news outlets and social media data. • It can be used for NLP tasks like text summarization, article extraction, entity extraction, and sentiment analysis, among others. Prof. Jayanand Kamble 21
  • 22. • IBM Watson • IBM Watson is a suite of AI services stored in the IBM Cloud. • One of its key features is Natural Language Understanding, which allows to identify and extract keywords, categories, emotions, entities, and more. • Amazon Comprehend: • Amazon Comprehend is an NLP service, integrated with the Amazon Web Services infrastructure. • This API can be used for NLP tasks such as sentiment analysis, topic modeling, entity recognition, and more. Prof. Jayanand Kamble 22
  • 23. • Google Cloud • The Google Cloud Natural Language API provides several pre- trained models for sentiment analysis, content classification, and entity extraction, among others. • It offers AutoML Natural Language, which allow to build customized machine learning models. • As part of the Google Cloud infrastructure, it uses Google question-answering and language understanding technology. Prof. Jayanand Kamble 23
  • 24. • NLTK • The Natural Language Toolkit (NLTK) with Python is one of the leading tools in NLP model building. • Focused on research and education in the NLP field, NLTK is bolstered by an active community, as well as a range of tutorials for language processing, sample datasets, and resources that include a comprehensive Language Processing and Python handbook. Prof. Jayanand Kamble 24
  • 25. • Stanford Core NLP • Stanford Core NLP is a popular library built and maintained by the NLP community at Stanford University. • It’s written in Java ‒ need to install JDK on computer ‒ but it has APIs in most programming languages. • The Core NLP toolkit allows to perform a variety of NLP tasks, such as part- of-speech tagging, tokenization, or named entity recognition. • Some of its main advantages include scalability and optimization for speed, making it a good choice for complex tasks. Prof. Jayanand Kamble 25
  • 26. • TextBlob • TextBlob is a Python library that works as an extension of NLTK, allowing to perform the same NLP tasks in a much more intuitive and user-friendly interface. • Its learning curve is more simple than with other open- source libraries, so it’s an excellent choice for beginners, who want to tackle NLP tasks like sentiment analysis, text classification, part-of-speech tagging, and more. Prof. Jayanand Kamble 26
  • 27. • SpaCy • One of the newest open-source Natural Language Processing with Python libraries is SpaCy. • It’s lightning-fast, easy to use, well-documented, and designed to support large volumes of data, not to mention, boasts a series of pretrained NLP models that make job even easier. • Unlike NLTK or CoreNLP, which display a number of algorithms for each task, SpaCy keeps its menu short and serves up the best available option for each task at hand. Prof. Jayanand Kamble 27
  • 28. • GenSim • Gensim is a highly specialized Python library that largely deals with topic modeling tasks using algorithms like Latent Dirichlet Allocation (LDA). • It’s also excellent at recognizing text similarities, indexing texts, and navigating different documents. • This library is fast, scalable, and good at handling large volumes of data. Prof. Jayanand Kamble 28
  • 29. NATURAL LANGUAGES VS. COMPUTER LANGUAGES • Ambiguity is the primary difference between natural and computer languages. • Formal programming languages are designed to be unambiguous, i.e. they can be defined by a grammar that produces a unique parse for each sentence in the language. • Programming languages are also designed for efficient parsing, i.e. they are deterministic context free languages • A sentence in a DCFL can be parsed in O( n ) time where n is the length of the string. Prof. Jayanand Kamble 29
  • 30. LINGUISTIC ORGANIZATION OF NLP • Phonetics and phonology • Morphology • Lexical Analysis • Syntactic Analysis • Semantic Analysis • Pragmatics • Discourse Prof. Jayanand Kamble 30 NLP architecture and stages of processing ambiguity at every stage ध्वनीशास्त्र and उच्चारशास्त्र मॉर्फोलॉजी शाब्दिक विश्लेषण िाक्यरचना विश्लेषण विमेंविक विश्लेषण व्यािहाररकता प्रिचन
  • 31. PHONETICS • Processing of speech • Challenges • Homophones: bank (finance) vs. bank (river bank) • Near Homophones: maatraa vs. maatra (Hin) • Word Boundary: aajaayenge (aa jaayenge (will come) or aaj aayenge (will come today) • Phrase boundary • PhD students are especially exhorted to attend as such seminars are integral to one's post graduate education • Disfluency: ah, um, ahem etc. • The best part of my job is … well … the best part of my job is the responsibility. Prof. Jayanand Kamble 31 ध्वनीशास्त्र
  • 32. WORD SEGMENTATION • Breaking a string of characters (graphemes) into a sequence of words. • In some written languages (e. g. Chinese) words are not separated by spaces. • Even in English, characters other than white space can be used to separate words [e. g. , ; . - :()] • Examples from English URLs: • Jumptheshark.com  jump the shark com • myspace.com/pluckerswingbar  • myspace com pluckers wing bar • myspace com plucker swing bar Prof. Jayanand Kamble 32
  • 33. MORPHOLOGICAL ANALYSIS • Morphology is the field of linguistics that studies the internal structure of words. • A morpheme is the smallest linguistic unit that has semantic meaning. • e. g. in, come, -ing, forming incoming ) • Morphological analysis is the task of segmenting a word into its morphemes • Carried  carry + ed (past tense) • Independently  in+(depend+ent)+ly • Googlers  (Google+er)+s (plural) • Unlockable  un+(lock+able)? Prof. Jayanand Kamble 33 unit that cannot be further divided
  • 34. MORPHOLOGY • Collection of “Word formation rules from root words” • Nouns: Plural(boy-boys); Gender marking (Ravi-Ravina) • Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit  had sat); Modality (e.g. request khaanaa  khaaiie) • Crucial first step in NLP Prof. Jayanand Kamble 34 पद्धत
  • 35. • Languages rich in morphology e. g. Dravidian, Hungarian, Turkish, Indian languages. • Languages poor in morphology Chinese, English. • Languages with rich morphology have the advantage of easier processing at higher stages of processing. • A task of interest to computer science: creating Finite State Word Morphology. Prof. Jayanand Kamble 35
  • 36. LEXICAL ANALYSIS • Essentially refers to dictionary access and obtaining the properties of the word. • Challenge: • Lexical or word sense disambiguation Prof. Jayanand Kamble 36 e. g. dog noun (lexical property) take 's' in plural (morph property) animate (semantic property) 4-legged ( do ) carnivore ( do ) शाब्दिक विश्लेषण
  • 37. LEXICAL DISAMBIGUATION • First step: Part of Speech Disambiguation • Dog as a noun (Animal) • Dog as a verb (to pursue or to go after or to follow somebody closely.) • Ex. A shadowy figure was dogging their every move. • Sense Disambiguation: • Dog ( as animal) • Dog ( as a very detestable person) Prof. Jayanand Kamble 37
  • 38. • Needs word relationships in a context: • The chair emphasized the need for adult education. • Very common in day to day communications: “Satellite Channel Ad: Watch what you want, when you want” (two senses of watch) • Watch: wristwatch / watching something Prof. Jayanand Kamble 38
  • 39. TECHNOLOGICAL DEVELOPMENTS BRING IN NEW TERMS, ADDITIONAL MEANINGS/NUANCES FOR EXISTING TERMS • Justify as in justify the right margin (word processing context) • Xeroxed : a new verb • Digital Trace : a new expression • Communifaking : pretending to talk on mobile when you are actually not • Discomgooglation : anxiety/discomfort at not being able to access internet • Helicopter Parenting : over parenting Prof. Jayanand Kamble 39
  • 40. AMBIGUITY OF MULTI-WORDS • The grandfather kicked the bucket after suffering from cancer. • This job is a piece of cake. • He is the dark horse of the match. Prof. Jayanand Kamble 40 to die खूप िोपी अिलेली गोष्ट; हातचा मळ someone who has a surprising ability or skill
  • 41. AMBIGUITY OF MULTI-WORDS: GOOGLE TRANSLATION Prof. Jayanand Kamble 41 आजोबाांनी क ॅ न्सरने ग्रािल्यानांतर बादलीला लाथ मारली. हे काम क े कचा तुकडा आहे. तो िामन्याचा डाक क हॉिक आहे.
  • 42. SYNTACTIC ANALYSIS Prof. Jayanand Kamble 42 िाक्यरचना विश्लेषण
  • 43. 1. PART OF SPEECH (POS) TAGGING • Annotate each word in a sentence with a PoS • Useful for subsequent syntactic parsing and word sense disambiguation Prof. Jayanand Kamble 43 I ate the spaghetti with meatballs. Pro V Det N Prep N
  • 44. 2. PHRASE CHUNKING • Phrase chunking is a phase of natural language processing that separates and segments a sentence into its sub-constituents, such as noun, verb, and prepositional phrases, abbreviated as NP, VP, and PP, respectively. Prof. Jayanand Kamble 44
  • 45. • Find all non recursive noun phrases (NPs) and verb phrases (VPs) in a sentence Prof. Jayanand Kamble 45 [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs]. [NP He ] [VP reckons] [NP the current account deficit ] [VP will narrow] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]
  • 47. NLP VS PLP NLP PLP domain of discourse broad: what can be expressed narrow: what can be computed lexicon large/complex small/simple grammatical constructs many and varied - declarative - interrogative - fragments etc. few - declarative - imperative meanings of an expression many one tools and techniques morphological analysis syntactic analysis semantic analysis integration of world knowledge lexical analysis context-free parsing code generation/compiling interpreting Prof. Jayanand Kamble 47

Editor's Notes

  1. Whenever you search something on Google, after typing 2-3 letters, it shows you the possible search terms. Or, if you search for something with typos, it corrects them and still finds relevant results for you. Isn’t it amazing? Have you ever used Google Translate to find out what a particular word or phrase is in a different language? I’m sure it’s a YES!!