SlideShare a Scribd company logo
1 of 78
MR. JAYANAND KAMBLE
WELCOME TO JAK’S TUTORIAL
Prof. Jayanand Kamble
1
CONTENTS:
• Definition
• Issues and strategies
• Application domain
• Tools for NLP
• Linguistic organization of NLP
• NLP vs PLP
Prof. Jayanand Kamble
2
OVERVIEW
• Human Evolution
• How did Stone Age communicate?
• Early humans could express thoughts and feelings by means of speech or
by signs or gestures. They could signal with fire and smoke, drums, or
whistles.
• Why language?
• Language helps us express our feelings and thoughts — this is unique to
our species because it is a way to express unique ideas and customs
within different cultures and societies.
Prof. Jayanand Kamble
3
• What are the world's spoken languages?
• Around 7000 languages are spoken in the world today.
Prof. Jayanand Kamble
4
INDIAN CONTEXT
• India is a multi-lingual country with great linguistic and cultural
diversities
• 22 official languages mentioned in the Indian constitution
• However, Census of India in 2001 reported-
• 122 major languages
• 1,599 other regional languages
Prof. Jayanand Kamble
5
• 2,371 scripts
• 30 languages are spoken by more than one million native
speakers
• 122 are spoken by more than 10,000 people
• 20% understand English
• 80% cannot understand
Prof. Jayanand Kamble
6
• Some languages are ranked within 20 in in the world in
terms of the populations speaking them.
• Hindi : 4th (~350 million)
• Bangla: 5th (~230 million)
• Marathi 10th (~84 million)
Prof. Jayanand Kamble
7
Prof. Jayanand Kamble
8
DEFINITION: NLP
• Humans communicate through some form of language either by text or speech.
• To make interactions between computers and humans, computers need to understand
natural languages used by humans.
• Natural language processing is all about making computers learn, understand,
analyze, manipulate and interpret natural(human) languages.
• NLP stands for Natural Language Processing, which is a part of Computer Science,
Human language, and Artificial Intelligence.
• Natural Language Processing (NLP) is the capacity of a computer to "understand"
natural language text at a level that allows meaningful interaction between the
computer and a person working in a particular application domain.
Prof. Jayanand Kamble
9
• Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your instructions, when
you want to hear decision from a dialogue based clinical expert system,
etc.
• The ability of machines to interpret human language is now at the core
of many applications that we use every day - chatbots, Email
classification and spam filters, search engines, grammar checkers, voice
assistants, and social language translators.
• The input and output of an NLP system can be Speech or Written Text
Prof. Jayanand Kamble
10
COMPONENTS OF NLP
• There are two components of NLP, Natural Language
Understanding (NLU) and Natural Language Generation (NLG).
• Natural Language Understanding (NLU) which involves
transforming human language into a machine-readable format.
• It helps the machine to understand and analyze human language
by extracting the text from large data such as keywords,
emotions, relations, and semantics.
Prof. Jayanand Kamble
11
• Natural Language Generation (NLG) acts as a translator that
converts the computerized data into natural language
representation.
• It mainly involves Text planning, Sentence planning, and Text
realization.
• The NLU is harder than NLG.
Prof. Jayanand Kamble
12
APPLICATION DOMAIN
• text processing - word processing, e-mail, spelling and grammar
checkers
• interfaces to data bases - query languages, information retrieval,
data mining, text summarization
• expert systems - explanations, disease diagnosis
• linguistics - machine translation, content analysis, writers'
assistants, language generation
Prof. Jayanand Kamble
13
APPLICATION DOMAIN EXAMPLES
• Search Autocorrect and Autocomplete:
• Language Translator:
• Social Media Monitoring: More and more people these days have started using
social media for posting their thoughts about a particular product, policy, or matter.
• Chatbots: Customer service and experience is the most important thing for any
company.
• Survey Analysis: Surveys are an important way of evaluating a company’s
performance. Companies conduct many surveys to get customer’s feedback on
various products.
• Targeted Advertising:
Prof. Jayanand Kamble
14
• Hiring and Recruitment: The Human Resource department is an integral part
of every company. They have the most important job of selecting the right
employees for a company.
• Voice Assistants: I am sure you’ve already met them, Google Assistant, Apple
Siri, Amazon Alexa, ring a bell? Yes, all of these are voice assistants.
• Grammar Checkers: This is one of the most widely used applications of
natural language processing.
• Email Filtering: Have you ever used Gmail?
Prof. Jayanand Kamble
15
• Sentiment Analysis: Natural language understanding is particularly difficult for
machines when it comes to opinions, given that humans often use sarcasm and
irony.
• Text Classification: Text classification, a text analysis task that also includes
sentiment analysis, involves automatically understanding, processing, and
categorizing unstructured text.
• Text Extraction: Text extraction, or information extraction, automatically detects
specific information in a text, such as names, companies, places, and more. This is
also known as named entity recognition.
• Machine Translation: Machine translation (MT) is one of the first applications of
natural language processing.
Prof. Jayanand Kamble
16
• Text Summarization: There are two ways of using natural language processing to
summarize data: extraction-based summarization ‒ which extracts key phrases
and creates a summary, without adding any extra information ‒ and abstraction-
based summarization, which creates new phrases paraphrasing the original
source.
• Market Intelligence: Marketers can benefit from natural language processing to
learn more about their customers and use those insights to create more effective
strategies.
• Intent Classification: Intent classification consists of identifying the goal or
purpose that underlies a text.
Prof. Jayanand Kamble
17
TOOLS FOR NLP
• Programming languages and software - Prolog , ALE , Lisp/Scheme,
C/C++, python, java
• Statistical Methods - Markov models, probabilistic grammars, text-
based analysis
• Abstract Models - Context-free grammars (BNF), Attribute grammars,
Predicate calculus and other semantic models, Knowledge-based and
ontological methods
Prof. Jayanand Kamble
18
TOOLS FOR NLP EXAMPLE
• Natural Language Processing tools are helping companies get
insights from unstructured text data like emails, online reviews,
social media posts, and more.
• There are many online tools that make NLP accessible to your
business, like
• open-source and
• SaaS.
Prof. Jayanand Kamble
19
• Open-source:
• Open-source libraries are free, flexible, and allow developers to fully
customize them.
• However, they’re not cost-effective and you’ll need to spend time
building and training open-source tools before you can reap the
benefits.
• SaaS tools
• SaaS tools are a great alternative if you don’t want to invest a lot of
time building complex infrastructures or spend money on extra
resources.
Prof. Jayanand Kamble
20
• MonkeyLearn:
• Is a user-friendly, NLP-powered platform that helps to gain valuable insights
from text data.
• To get started, the pre-trained models can be used to perform text analysis
tasks such as sentiment analysis, topic classification, or keyword extraction.
• Aylien:
• Is a SaaS API that uses deep learning and NLP to analyze large volumes of text-
based data, such as academic publications, real-time content from news outlets
and social media data.
• It can be used for NLP tasks like text summarization, article extraction, entity
extraction, and sentiment analysis, among others.
Prof. Jayanand Kamble
21
• IBM Watson
• IBM Watson is a suite of AI services stored in the IBM Cloud.
• One of its key features is Natural Language Understanding, which allows
to identify and extract keywords, categories, emotions, entities, and more.
• Amazon Comprehend:
• Amazon Comprehend is an NLP service, integrated with the Amazon
Web Services infrastructure.
• This API can be used for NLP tasks such as sentiment analysis, topic
modeling, entity recognition, and more.
Prof. Jayanand Kamble
22
• Google Cloud
• The Google Cloud Natural Language API provides several pre-
trained models for sentiment analysis, content classification, and
entity extraction, among others.
• It offers AutoML Natural Language, which allow to build
customized machine learning models.
• As part of the Google Cloud infrastructure, it uses Google
question-answering and language understanding technology.
Prof. Jayanand Kamble
23
• NLTK
• The Natural Language Toolkit (NLTK) with Python is one of
the leading tools in NLP model building.
• Focused on research and education in the NLP field, NLTK is
bolstered by an active community, as well as a range of
tutorials for language processing, sample datasets, and
resources that include a comprehensive Language
Processing and Python handbook.
Prof. Jayanand Kamble
24
• Stanford Core NLP
• Stanford Core NLP is a popular library built and maintained by the NLP
community at Stanford University.
• It’s written in Java ‒ need to install JDK on computer ‒ but it has APIs in
most programming languages.
• The Core NLP toolkit allows to perform a variety of NLP tasks, such as part-
of-speech tagging, tokenization, or named entity recognition.
• Some of its main advantages include scalability and optimization for
speed, making it a good choice for complex tasks.
Prof. Jayanand Kamble
25
• TextBlob
• TextBlob is a Python library that works as an extension of
NLTK, allowing to perform the same NLP tasks in a much
more intuitive and user-friendly interface.
• Its learning curve is more simple than with other open-
source libraries, so it’s an excellent choice for beginners, who
want to tackle NLP tasks like sentiment analysis, text
classification, part-of-speech tagging, and more.
Prof. Jayanand Kamble
26
• SpaCy
• One of the newest open-source Natural Language Processing with
Python libraries is SpaCy.
• It’s lightning-fast, easy to use, well-documented, and designed to
support large volumes of data, not to mention, boasts a series of
pretrained NLP models that make job even easier.
• Unlike NLTK or CoreNLP, which display a number of algorithms for
each task, SpaCy keeps its menu short and serves up the best available
option for each task at hand.
Prof. Jayanand Kamble
27
• GenSim
• Gensim is a highly specialized Python library that largely deals
with topic modeling tasks using algorithms like Latent Dirichlet
Allocation (LDA).
• It’s also excellent at recognizing text similarities, indexing texts,
and navigating different documents.
• This library is fast, scalable, and good at handling large volumes of
data.
Prof. Jayanand Kamble
28
NATURAL LANGUAGES VS. COMPUTER LANGUAGES
• Ambiguity is the primary difference between natural and computer
languages.
• Formal programming languages are designed to be unambiguous, i.e.
they can be defined by a grammar that produces a unique parse for
each sentence in the language.
• Programming languages are also designed for efficient parsing, i.e.
they are deterministic context free languages
• A sentence in a DCFL can be parsed in O( n ) time where n is the length of the
string.
Prof. Jayanand Kamble
29
LINGUISTIC ORGANIZATION OF NLP
• Phonetics and phonology
• Morphology
• Lexical Analysis
• Syntactic Analysis
• Semantic Analysis
• Pragmatics
• Discourse
Prof. Jayanand Kamble
30
NLP architecture and stages of
processing ambiguity at every
stage
ध्वनीशास्त्र and उच्चारशास्त्र
मॉर्फोलॉजी
शाब्दिक विश्लेषण
िाक्यरचना विश्लेषण
विमेंविक विश्लेषण
व्यािहाररकता
प्रिचन
PHONETICS
• Processing of speech
• Challenges
• Homophones: bank (finance) vs. bank (river bank)
• Near Homophones: maatraa vs. maatra (Hin)
• Word Boundary: aajaayenge (aa jaayenge (will come) or aaj aayenge (will
come today)
• Phrase boundary
• PhD students are especially exhorted to attend as such seminars are integral to one's
post graduate education
• Disfluency: ah, um, ahem etc.
• The best part of my job is … well … the best part of my job is the responsibility.
Prof. Jayanand Kamble
31 ध्वनीशास्त्र
WORD SEGMENTATION
• Breaking a string of characters (graphemes) into a sequence of words.
• In some written languages (e. g. Chinese) words are not separated by
spaces.
• Even in English, characters other than white space can be used to
separate words [e. g. , ; . - :()]
• Examples from English URLs:
• Jumptheshark.com  jump the shark com
• myspace.com/pluckerswingbar 
• myspace com pluckers wing bar
• myspace com plucker swing bar
Prof. Jayanand Kamble
32
MORPHOLOGICAL ANALYSIS
• Morphology is the field of linguistics that studies the internal structure of
words.
• A morpheme is the smallest linguistic unit that has semantic meaning.
• e. g. in, come, -ing, forming incoming )
• Morphological analysis is the task of segmenting a word into its
morphemes
• Carried  carry + ed (past tense)
• Independently  in+(depend+ent)+ly
• Googlers  (Google+er)+s (plural)
• Unlockable  un+(lock+able)?
Prof. Jayanand Kamble
33 unit that cannot be further divided
MORPHOLOGY
• Collection of “Word formation rules from root words”
• Nouns: Plural(boy-boys); Gender marking (Ravi-Ravina)
• Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit 
had sat); Modality (e.g. request khaanaa  khaaiie)
• Crucial first step in NLP
Prof. Jayanand Kamble
34
पद्धत
• Languages rich in morphology e. g. Dravidian, Hungarian, Turkish, Indian
languages.
• Languages poor in morphology Chinese, English.
• Languages with rich morphology have the advantage of easier
processing at higher stages of processing.
• A task of interest to computer science: creating Finite State
Word Morphology.
Prof. Jayanand Kamble
35
LEXICAL ANALYSIS
• Essentially refers to dictionary access and obtaining
the properties of the word.
• Challenge:
• Lexical or word sense
disambiguation
Prof. Jayanand Kamble
36
e. g. dog
noun (lexical property)
take 's' in plural (morph
property)
animate (semantic property)
4-legged ( do )
carnivore ( do )
शाब्दिक विश्लेषण
LEXICAL DISAMBIGUATION
• First step: Part of Speech Disambiguation
• Dog as a noun (Animal)
• Dog as a verb (to pursue or to go after or to follow somebody
closely.)
• Ex. A shadowy figure was dogging their every move.
• Sense Disambiguation:
• Dog ( as animal)
• Dog ( as a very detestable person)
Prof. Jayanand Kamble
37
• Needs word relationships in a context:
• The chair emphasized the need for adult education.
• Very common in day to day communications:
“Satellite Channel Ad: Watch what you want, when you
want”
(two senses of watch)
• Watch: wristwatch / watching something
Prof. Jayanand Kamble
38
TECHNOLOGICAL DEVELOPMENTS BRING IN NEW TERMS,
ADDITIONAL MEANINGS/NUANCES FOR EXISTING TERMS
• Justify as in justify the right margin (word processing context)
• Xeroxed : a new verb
• Digital Trace : a new expression
• Communifaking : pretending to talk on mobile when you are actually
not
• Discomgooglation : anxiety/discomfort at not being able to access
internet
• Helicopter Parenting : over parenting
Prof. Jayanand Kamble
39
AMBIGUITY OF MULTI-WORDS
• The grandfather kicked the bucket after suffering from
cancer.
• This job is a piece of cake.
• He is the dark horse of the match.
Prof. Jayanand Kamble
40
to die
खूप िोपी अिलेली गोष्ट; हातचा मळ
someone who has a surprising ability or
skill
AMBIGUITY OF MULTI-WORDS: GOOGLE
TRANSLATION
Prof. Jayanand Kamble
41
आजोबाांनी क
ॅ न्सरने ग्रािल्यानांतर बादलीला लाथ मारली.
हे काम क
े कचा तुकडा आहे.
तो िामन्याचा डाक
क हॉिक आहे.
SYNTACTIC ANALYSIS
Prof. Jayanand Kamble
42
िाक्यरचना विश्लेषण
1. PART OF SPEECH (POS) TAGGING
• Annotate each word in a sentence with a PoS
• Useful for subsequent syntactic parsing and word
sense disambiguation
Prof. Jayanand Kamble
43
I ate the spaghetti with meatballs.
Pro V Det N Prep N
िीपा देणे
2. PHRASE CHUNKING
• Phrase chunking is a phase of natural language
processing that separates and segments a
sentence into its sub-constituents, such as
noun, verb, and prepositional phrases,
abbreviated as NP, VP, and PP, respectively.
Prof. Jayanand Kamble
44
• Find all non recursive noun phrases (NPs) and verb
phrases (VPs) in a sentence
Prof. Jayanand Kamble
45
[NP I] [VP ate] [NP the spaghetti] [PP with] [NP
meatballs].
[NP He ] [VP reckons] [NP the current account deficit
] [VP will narrow] [PP to ] [NP only # 1.8 billion ] [PP
in ] [NP September ]
SYNTAX PROCESSING STAGE
Prof. Jayanand Kamble
46
I like mangoes
PARSER
• The word ‘Parsing’ whose origin is from Latin word ‘pars’ (which
means ‘part’), is used to draw exact meaning or dictionary meaning
from the text.
• It is also called Syntactic analysis or syntax analysis.
• Comparing the rules of formal grammar, syntax analysis checks the
text for meaningfulness.
• The sentence like “Give me hot ice-cream”, for example, would be
rejected by parser or syntactic analyzer.
Prof. Jayanand Kamble
47
•Parser (Def):
• The process of analyzing the strings of symbols in
natural language conforming to the rules of formal
grammar.
Prof. Jayanand Kamble
48
RELEVANCE OF PARSING IN NLP
• Parser is used to report any syntax error.
• It helps to recover from commonly occurring error so that the
processing of the remainder of program can be continued.
• Parse tree is created with the help of a parser.
• Parser is used to create symbol table, which plays an important role
in NLP.
• Parser is also used to produce intermediate representations (IR).
Prof. Jayanand Kamble
49
• Parsing Strategy (Driven by grammar)
• S-> NP VP
• NP-> N | PRON
• VP-> V NP | V PP
• N-> Mangoes
• PRON-> I
• V-> like
Prof. Jayanand Kamble
50
I LIKE MANGOES
PRO V NOUN
Prof. Jayanand Kamble
51
•Nltk for NLP
•“A cleaver fox was jumping over the wall”
Prof. Jayanand Kamble
52
CHALLENGES IN SYNTACTIC PROCESSING:
STRUCTURAL AMBIGUITY
• Preposition Phrase Attachment
• I saw the boy with a telescope.
(who has the telescope?)
• I saw the mountain with a telescope.
(world knowledge: mountain cannot be an instrument of
seeing)
• I saw the boy with the pony tail.
( world knowledge : pony tail cannot be an instrument of
Prof. Jayanand Kamble
53
SEMANTIC TASKS
• Representation in terms of
• Predicate calculus/Semantic Nets/Frames/Conceptual Dependencies and
Scripts
• John gave a book to Mary
• Give: action, Agent: John, Object: Book, Recipient: Mary
• Challenge : ambiguity in semantic role labeling
• (Eng ) Visiting aunts can be a nuisance
• (Hin ) aapko mujhe mithaai khilaanii padegii …… ambiguous in Marathi
too.
Prof. Jayanand Kamble
54
PREDICATE CALCULUS
• A predicate is an expression of one or more variables
defined on some specific domain.
• A predicate with variables can be made a proposition
by either assigning a value to the variable or by
quantifying the variable.
Prof. Jayanand Kamble
55
• Consider the following statement.
Ram is a student.
• Now consider the above statement in terms of Predicate
calculus.
Here "is a student" is a predicate and Ram is subject.
Prof. Jayanand Kamble
56
• Let's denote "Ram" as x and "is a student" as a predicate P
then we can write the above statement as P(x).
• Generally, a statement expressed by Predicate must have at
least one object associated with Predicate.
• In this case, Ram is the required object associated with
predicate P.
Prof. Jayanand Kamble
57
WORD SENSE DISAMBIGUATION (WSD)
• Words in natural language usually have a fair number of
different possible meanings
• Ravi has a strong interest in computer science.
• Ravi pays a large amount of interest on his credit card.
• For many tasks (such as question answering, translation) the
proper sense of each ambiguous word in a sentence must be
determined.
Prof. Jayanand Kamble
58
PRAGMATICS/DISCOURSE TASKS
Prof. Jayanand Kamble
59
PRAGMATICS
• Very hard problem
• Model user intention
• Tourist (in a hurry, checking out of the hotel, motioning to
the service boy): Boy, go upstairs and see if my sandals are
under the divan. Do not be late. I just have 15 minutes to
catch the train.
• Boy (running upstairs and coming back panting): yes sir,
they are there.
Prof. Jayanand Kamble
60
व्यािहाररकता
DISCOURSE
• Processing of sequence of sentences.
• Mother to John: “John go to school. It is open today. Should you bunk? Father will be very
angry.”
• Ambiguity of
• open
• bunk what?
• Why will the father be angry?
• Complex chain of reasoning and application of world knowledge
• Ambiguity of father
• father as parent or father as headmaster
Prof. Jayanand Kamble
61 प्रिचन
• Determine which phrases in a document refer to the same
underlying entity
• John put the carrot on the plate and ate it
Prof. Jayanand Kamble
62
APPLICATION OF NLP
Prof. Jayanand Kamble
63
INFORMATION EXTRACTION (IE)
• Identify phrases in language that refer to specific types of entities and
relations in text
• Named entity recognition is task of identifying names of people, places,
organizations, etc. in text.
Michael Dell is the CEO of Dell Computer Corporation and lives in
AustinTexas.
• Relation extraction identifies specific relations between entities.
Michael Dell is the CEO of Dell Computer Corporation and lives in
AustinTexas.
Prof. Jayanand Kamble
64
peopl
e
organizatio
ns places
QUESTION ANSWERING
• Directly answer natural language questions based on information
presented in a corpora of textual documents (e. g. the web)
• When was Barack Obama born?
• August 4, 1961
• Who was president when Barack Obama was born?
• John F. Kennedy
• How many presidents have there been since Barack Obama was
born?
• 9
Prof. Jayanand Kamble
65
SENTIMENT ANALYSIS
• Extract subjective information usually from a set of
documents, often using online reviews to
determine "polarity" about specific objects.
• Especially useful for identifying trends of public
opinion in the social media, for the purpose of
marketing.
Prof. Jayanand Kamble
66
MACHINE TRANSLATION (MT)
• Translate a sentence from one natural language to
another.
• Ex.
• English: I would like to take admission in MGM
University
• Marathi: मला एमजीएम विद्यापीठात प्रिेश घ्यायचा आहे
• Hindi: मैं एमजीएम विश्वविद्यालय में प्रिेश लेना चाहता हां
Prof. Jayanand Kamble
67
AMBIGUITY RESOLUTION IS REQUIRED FOR
TRANSLATION
• Syntactic and semantic ambiguities must be
properly resolved for correct translation:
• “Jak plays the guitar.” → " जैक वगिार बजाता है।"
• “Jak plays soccer.” → " जैक र्फ
ु िबॉल खेलता है।"
Prof. Jayanand Kamble
68
NLP HISTORY
Prof. Jayanand Kamble
69
EARLY HISTORY: 1950’S
• Shannon (the father of information theory) explored probabilistic models of
natural language (1951)
• Chomsky (the extremely influential linguist) developed formal models of syntax,
i.e. finite state and context-free grammars (1956)
• First computational parser developed at UPenn as a cascade of finite-state
transducers (Joshi, 1961; Harris, 1962)
• Bayesian methods developed for optical character recognition (OCR) (Bledsoe &
Browning, 1959)
Prof. Jayanand Kamble
70
HISTORY: 1960’S
• Work at MIT AI lab on question answering (BASEBALL) and dialog
(ELIZA)
• Semantic network models of language for question answering
(Simmons, 1965).
• First electronic corpus collected, Brown corpus, 1 million words
(Kucera and Francis, 1967)
• Bayesian methods used to identify document authorship (The
Federalist papers) (Mosteller &Wallace, 1964)
Prof. Jayanand Kamble
71
HISTORY: 1970’S
• “Natural language understanding” systems developed that tried to support
deeper semantic interpretation
• SHRDLU (Winograd, 1972) performs tasks in the “blocks world” based on NL
instruction
• Schank et al. (1972, 1977) developed systems for conceptual representation of
language and for understanding short stories using hand-coded knowledge of
scripts, plans, and goals.
• Prolog programming language developed to support logic-based parsing
(Colmeraurer, 1975).
• Initial development of hidden Markov models (HMMs) for statistical speech
recognition (Baker, 1975; Jelinek, 1976)
Prof. Jayanand Kamble
72
HISTORY: 1980’S
• Development of more complex (mildly context sensitive)
grammatical formalisms, e. g. unification grammar, tree-
adjoning grammar etc.
• Symbolic work on discourse processing and NL generation.
• Initial use of statistical (HMM) methods for syntactic analysis
(POS tagging) (Church, 1988).
Prof. Jayanand Kamble
73
HISTORY: 1990’S
• Rise of statistical methods and empirical evaluation causes a “scientific revolution”
in the field.
• Initial annotated corpora developed for training and testing systems for POS
tagging, parsing, WSD, information extraction, MT, etc.
• First statistical machine translation systems developed at IBM for Canadian
Hansards corpus (Brown et al., 1990)
• First robust statistical parsers developed (Magerman, 1995; Collins, 1996;
Charniak, 1997)
• First systems for robust information extraction developed (e.g. MUC
competitions)
Prof. Jayanand Kamble
74
HISTORY: 2000 ONWARDS
• Information extraction from social networks
• Information retrieval
• Cross-lingual information access
• Machine Translation (statistical, hybrid etc.)
• Biomedical text mining
• Discourse processing
Prof. Jayanand Kamble
75
NLP VS PLP
Prof. Jayanand Kamble
76
Prof. Jayanand Kamble
77
NLP PLP
domain of
discourse
broad: what can be
expressed
narrow: what can be
computed
lexicon large/complex small/simple
grammatical
constructs
many and varied
- declarative
- interrogative
- fragments
etc.
few
- declarative
- imperative
meanings of an
expression
many one
tools and
techniques
morphological analysis
syntactic analysis
semantic analysis
integration of world
knowledge
lexical analysis
context-free parsing
code generation/compiling
interpreting
UNIT 01 END
Prof. Jayanand Kamble
78

More Related Content

Similar to Introduction to NLP_1.pptx

NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptxAmanBadesra1
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
NLP presentation.pptx
NLP presentation.pptxNLP presentation.pptx
NLP presentation.pptxpysgpa
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
naturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfnaturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfshakeelAsghar6
 
Natural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxNatural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxSHIBDASDUTTA
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnRAtna29
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
An Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxAn Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxSoftxai
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 

Similar to Introduction to NLP_1.pptx (20)

NLP,expert,robotics.pptx
NLP,expert,robotics.pptxNLP,expert,robotics.pptx
NLP,expert,robotics.pptx
 
sample PPT.pptx
sample PPT.pptxsample PPT.pptx
sample PPT.pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
NLP presentation.pptx
NLP presentation.pptxNLP presentation.pptx
NLP presentation.pptx
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
naturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdfnaturallanguageprocessing-160722053804.pdf
naturallanguageprocessing-160722053804.pdf
 
Natural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptxNatural Language Processing-(NLP).pptx
Natural Language Processing-(NLP).pptx
 
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP-ppt.pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
WomenTech_Event
WomenTech_EventWomenTech_Event
WomenTech_Event
 
An Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptxAn Overview of Natural Language Processing.pptx
An Overview of Natural Language Processing.pptx
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 

Recently uploaded

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Introduction to NLP_1.pptx

  • 1. MR. JAYANAND KAMBLE WELCOME TO JAK’S TUTORIAL Prof. Jayanand Kamble 1
  • 2. CONTENTS: • Definition • Issues and strategies • Application domain • Tools for NLP • Linguistic organization of NLP • NLP vs PLP Prof. Jayanand Kamble 2
  • 3. OVERVIEW • Human Evolution • How did Stone Age communicate? • Early humans could express thoughts and feelings by means of speech or by signs or gestures. They could signal with fire and smoke, drums, or whistles. • Why language? • Language helps us express our feelings and thoughts — this is unique to our species because it is a way to express unique ideas and customs within different cultures and societies. Prof. Jayanand Kamble 3
  • 4. • What are the world's spoken languages? • Around 7000 languages are spoken in the world today. Prof. Jayanand Kamble 4
  • 5. INDIAN CONTEXT • India is a multi-lingual country with great linguistic and cultural diversities • 22 official languages mentioned in the Indian constitution • However, Census of India in 2001 reported- • 122 major languages • 1,599 other regional languages Prof. Jayanand Kamble 5
  • 6. • 2,371 scripts • 30 languages are spoken by more than one million native speakers • 122 are spoken by more than 10,000 people • 20% understand English • 80% cannot understand Prof. Jayanand Kamble 6
  • 7. • Some languages are ranked within 20 in in the world in terms of the populations speaking them. • Hindi : 4th (~350 million) • Bangla: 5th (~230 million) • Marathi 10th (~84 million) Prof. Jayanand Kamble 7
  • 9. DEFINITION: NLP • Humans communicate through some form of language either by text or speech. • To make interactions between computers and humans, computers need to understand natural languages used by humans. • Natural language processing is all about making computers learn, understand, analyze, manipulate and interpret natural(human) languages. • NLP stands for Natural Language Processing, which is a part of Computer Science, Human language, and Artificial Intelligence. • Natural Language Processing (NLP) is the capacity of a computer to "understand" natural language text at a level that allows meaningful interaction between the computer and a person working in a particular application domain. Prof. Jayanand Kamble 9
  • 10. • Processing of Natural Language is required when you want an intelligent system like robot to perform as per your instructions, when you want to hear decision from a dialogue based clinical expert system, etc. • The ability of machines to interpret human language is now at the core of many applications that we use every day - chatbots, Email classification and spam filters, search engines, grammar checkers, voice assistants, and social language translators. • The input and output of an NLP system can be Speech or Written Text Prof. Jayanand Kamble 10
  • 11. COMPONENTS OF NLP • There are two components of NLP, Natural Language Understanding (NLU) and Natural Language Generation (NLG). • Natural Language Understanding (NLU) which involves transforming human language into a machine-readable format. • It helps the machine to understand and analyze human language by extracting the text from large data such as keywords, emotions, relations, and semantics. Prof. Jayanand Kamble 11
  • 12. • Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural language representation. • It mainly involves Text planning, Sentence planning, and Text realization. • The NLU is harder than NLG. Prof. Jayanand Kamble 12
  • 13. APPLICATION DOMAIN • text processing - word processing, e-mail, spelling and grammar checkers • interfaces to data bases - query languages, information retrieval, data mining, text summarization • expert systems - explanations, disease diagnosis • linguistics - machine translation, content analysis, writers' assistants, language generation Prof. Jayanand Kamble 13
  • 14. APPLICATION DOMAIN EXAMPLES • Search Autocorrect and Autocomplete: • Language Translator: • Social Media Monitoring: More and more people these days have started using social media for posting their thoughts about a particular product, policy, or matter. • Chatbots: Customer service and experience is the most important thing for any company. • Survey Analysis: Surveys are an important way of evaluating a company’s performance. Companies conduct many surveys to get customer’s feedback on various products. • Targeted Advertising: Prof. Jayanand Kamble 14
  • 15. • Hiring and Recruitment: The Human Resource department is an integral part of every company. They have the most important job of selecting the right employees for a company. • Voice Assistants: I am sure you’ve already met them, Google Assistant, Apple Siri, Amazon Alexa, ring a bell? Yes, all of these are voice assistants. • Grammar Checkers: This is one of the most widely used applications of natural language processing. • Email Filtering: Have you ever used Gmail? Prof. Jayanand Kamble 15
  • 16. • Sentiment Analysis: Natural language understanding is particularly difficult for machines when it comes to opinions, given that humans often use sarcasm and irony. • Text Classification: Text classification, a text analysis task that also includes sentiment analysis, involves automatically understanding, processing, and categorizing unstructured text. • Text Extraction: Text extraction, or information extraction, automatically detects specific information in a text, such as names, companies, places, and more. This is also known as named entity recognition. • Machine Translation: Machine translation (MT) is one of the first applications of natural language processing. Prof. Jayanand Kamble 16
  • 17. • Text Summarization: There are two ways of using natural language processing to summarize data: extraction-based summarization ‒ which extracts key phrases and creates a summary, without adding any extra information ‒ and abstraction- based summarization, which creates new phrases paraphrasing the original source. • Market Intelligence: Marketers can benefit from natural language processing to learn more about their customers and use those insights to create more effective strategies. • Intent Classification: Intent classification consists of identifying the goal or purpose that underlies a text. Prof. Jayanand Kamble 17
  • 18. TOOLS FOR NLP • Programming languages and software - Prolog , ALE , Lisp/Scheme, C/C++, python, java • Statistical Methods - Markov models, probabilistic grammars, text- based analysis • Abstract Models - Context-free grammars (BNF), Attribute grammars, Predicate calculus and other semantic models, Knowledge-based and ontological methods Prof. Jayanand Kamble 18
  • 19. TOOLS FOR NLP EXAMPLE • Natural Language Processing tools are helping companies get insights from unstructured text data like emails, online reviews, social media posts, and more. • There are many online tools that make NLP accessible to your business, like • open-source and • SaaS. Prof. Jayanand Kamble 19
  • 20. • Open-source: • Open-source libraries are free, flexible, and allow developers to fully customize them. • However, they’re not cost-effective and you’ll need to spend time building and training open-source tools before you can reap the benefits. • SaaS tools • SaaS tools are a great alternative if you don’t want to invest a lot of time building complex infrastructures or spend money on extra resources. Prof. Jayanand Kamble 20
  • 21. • MonkeyLearn: • Is a user-friendly, NLP-powered platform that helps to gain valuable insights from text data. • To get started, the pre-trained models can be used to perform text analysis tasks such as sentiment analysis, topic classification, or keyword extraction. • Aylien: • Is a SaaS API that uses deep learning and NLP to analyze large volumes of text- based data, such as academic publications, real-time content from news outlets and social media data. • It can be used for NLP tasks like text summarization, article extraction, entity extraction, and sentiment analysis, among others. Prof. Jayanand Kamble 21
  • 22. • IBM Watson • IBM Watson is a suite of AI services stored in the IBM Cloud. • One of its key features is Natural Language Understanding, which allows to identify and extract keywords, categories, emotions, entities, and more. • Amazon Comprehend: • Amazon Comprehend is an NLP service, integrated with the Amazon Web Services infrastructure. • This API can be used for NLP tasks such as sentiment analysis, topic modeling, entity recognition, and more. Prof. Jayanand Kamble 22
  • 23. • Google Cloud • The Google Cloud Natural Language API provides several pre- trained models for sentiment analysis, content classification, and entity extraction, among others. • It offers AutoML Natural Language, which allow to build customized machine learning models. • As part of the Google Cloud infrastructure, it uses Google question-answering and language understanding technology. Prof. Jayanand Kamble 23
  • 24. • NLTK • The Natural Language Toolkit (NLTK) with Python is one of the leading tools in NLP model building. • Focused on research and education in the NLP field, NLTK is bolstered by an active community, as well as a range of tutorials for language processing, sample datasets, and resources that include a comprehensive Language Processing and Python handbook. Prof. Jayanand Kamble 24
  • 25. • Stanford Core NLP • Stanford Core NLP is a popular library built and maintained by the NLP community at Stanford University. • It’s written in Java ‒ need to install JDK on computer ‒ but it has APIs in most programming languages. • The Core NLP toolkit allows to perform a variety of NLP tasks, such as part- of-speech tagging, tokenization, or named entity recognition. • Some of its main advantages include scalability and optimization for speed, making it a good choice for complex tasks. Prof. Jayanand Kamble 25
  • 26. • TextBlob • TextBlob is a Python library that works as an extension of NLTK, allowing to perform the same NLP tasks in a much more intuitive and user-friendly interface. • Its learning curve is more simple than with other open- source libraries, so it’s an excellent choice for beginners, who want to tackle NLP tasks like sentiment analysis, text classification, part-of-speech tagging, and more. Prof. Jayanand Kamble 26
  • 27. • SpaCy • One of the newest open-source Natural Language Processing with Python libraries is SpaCy. • It’s lightning-fast, easy to use, well-documented, and designed to support large volumes of data, not to mention, boasts a series of pretrained NLP models that make job even easier. • Unlike NLTK or CoreNLP, which display a number of algorithms for each task, SpaCy keeps its menu short and serves up the best available option for each task at hand. Prof. Jayanand Kamble 27
  • 28. • GenSim • Gensim is a highly specialized Python library that largely deals with topic modeling tasks using algorithms like Latent Dirichlet Allocation (LDA). • It’s also excellent at recognizing text similarities, indexing texts, and navigating different documents. • This library is fast, scalable, and good at handling large volumes of data. Prof. Jayanand Kamble 28
  • 29. NATURAL LANGUAGES VS. COMPUTER LANGUAGES • Ambiguity is the primary difference between natural and computer languages. • Formal programming languages are designed to be unambiguous, i.e. they can be defined by a grammar that produces a unique parse for each sentence in the language. • Programming languages are also designed for efficient parsing, i.e. they are deterministic context free languages • A sentence in a DCFL can be parsed in O( n ) time where n is the length of the string. Prof. Jayanand Kamble 29
  • 30. LINGUISTIC ORGANIZATION OF NLP • Phonetics and phonology • Morphology • Lexical Analysis • Syntactic Analysis • Semantic Analysis • Pragmatics • Discourse Prof. Jayanand Kamble 30 NLP architecture and stages of processing ambiguity at every stage ध्वनीशास्त्र and उच्चारशास्त्र मॉर्फोलॉजी शाब्दिक विश्लेषण िाक्यरचना विश्लेषण विमेंविक विश्लेषण व्यािहाररकता प्रिचन
  • 31. PHONETICS • Processing of speech • Challenges • Homophones: bank (finance) vs. bank (river bank) • Near Homophones: maatraa vs. maatra (Hin) • Word Boundary: aajaayenge (aa jaayenge (will come) or aaj aayenge (will come today) • Phrase boundary • PhD students are especially exhorted to attend as such seminars are integral to one's post graduate education • Disfluency: ah, um, ahem etc. • The best part of my job is … well … the best part of my job is the responsibility. Prof. Jayanand Kamble 31 ध्वनीशास्त्र
  • 32. WORD SEGMENTATION • Breaking a string of characters (graphemes) into a sequence of words. • In some written languages (e. g. Chinese) words are not separated by spaces. • Even in English, characters other than white space can be used to separate words [e. g. , ; . - :()] • Examples from English URLs: • Jumptheshark.com  jump the shark com • myspace.com/pluckerswingbar  • myspace com pluckers wing bar • myspace com plucker swing bar Prof. Jayanand Kamble 32
  • 33. MORPHOLOGICAL ANALYSIS • Morphology is the field of linguistics that studies the internal structure of words. • A morpheme is the smallest linguistic unit that has semantic meaning. • e. g. in, come, -ing, forming incoming ) • Morphological analysis is the task of segmenting a word into its morphemes • Carried  carry + ed (past tense) • Independently  in+(depend+ent)+ly • Googlers  (Google+er)+s (plural) • Unlockable  un+(lock+able)? Prof. Jayanand Kamble 33 unit that cannot be further divided
  • 34. MORPHOLOGY • Collection of “Word formation rules from root words” • Nouns: Plural(boy-boys); Gender marking (Ravi-Ravina) • Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit  had sat); Modality (e.g. request khaanaa  khaaiie) • Crucial first step in NLP Prof. Jayanand Kamble 34 पद्धत
  • 35. • Languages rich in morphology e. g. Dravidian, Hungarian, Turkish, Indian languages. • Languages poor in morphology Chinese, English. • Languages with rich morphology have the advantage of easier processing at higher stages of processing. • A task of interest to computer science: creating Finite State Word Morphology. Prof. Jayanand Kamble 35
  • 36. LEXICAL ANALYSIS • Essentially refers to dictionary access and obtaining the properties of the word. • Challenge: • Lexical or word sense disambiguation Prof. Jayanand Kamble 36 e. g. dog noun (lexical property) take 's' in plural (morph property) animate (semantic property) 4-legged ( do ) carnivore ( do ) शाब्दिक विश्लेषण
  • 37. LEXICAL DISAMBIGUATION • First step: Part of Speech Disambiguation • Dog as a noun (Animal) • Dog as a verb (to pursue or to go after or to follow somebody closely.) • Ex. A shadowy figure was dogging their every move. • Sense Disambiguation: • Dog ( as animal) • Dog ( as a very detestable person) Prof. Jayanand Kamble 37
  • 38. • Needs word relationships in a context: • The chair emphasized the need for adult education. • Very common in day to day communications: “Satellite Channel Ad: Watch what you want, when you want” (two senses of watch) • Watch: wristwatch / watching something Prof. Jayanand Kamble 38
  • 39. TECHNOLOGICAL DEVELOPMENTS BRING IN NEW TERMS, ADDITIONAL MEANINGS/NUANCES FOR EXISTING TERMS • Justify as in justify the right margin (word processing context) • Xeroxed : a new verb • Digital Trace : a new expression • Communifaking : pretending to talk on mobile when you are actually not • Discomgooglation : anxiety/discomfort at not being able to access internet • Helicopter Parenting : over parenting Prof. Jayanand Kamble 39
  • 40. AMBIGUITY OF MULTI-WORDS • The grandfather kicked the bucket after suffering from cancer. • This job is a piece of cake. • He is the dark horse of the match. Prof. Jayanand Kamble 40 to die खूप िोपी अिलेली गोष्ट; हातचा मळ someone who has a surprising ability or skill
  • 41. AMBIGUITY OF MULTI-WORDS: GOOGLE TRANSLATION Prof. Jayanand Kamble 41 आजोबाांनी क ॅ न्सरने ग्रािल्यानांतर बादलीला लाथ मारली. हे काम क े कचा तुकडा आहे. तो िामन्याचा डाक क हॉिक आहे.
  • 42. SYNTACTIC ANALYSIS Prof. Jayanand Kamble 42 िाक्यरचना विश्लेषण
  • 43. 1. PART OF SPEECH (POS) TAGGING • Annotate each word in a sentence with a PoS • Useful for subsequent syntactic parsing and word sense disambiguation Prof. Jayanand Kamble 43 I ate the spaghetti with meatballs. Pro V Det N Prep N िीपा देणे
  • 44. 2. PHRASE CHUNKING • Phrase chunking is a phase of natural language processing that separates and segments a sentence into its sub-constituents, such as noun, verb, and prepositional phrases, abbreviated as NP, VP, and PP, respectively. Prof. Jayanand Kamble 44
  • 45. • Find all non recursive noun phrases (NPs) and verb phrases (VPs) in a sentence Prof. Jayanand Kamble 45 [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs]. [NP He ] [VP reckons] [NP the current account deficit ] [VP will narrow] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]
  • 46. SYNTAX PROCESSING STAGE Prof. Jayanand Kamble 46 I like mangoes
  • 47. PARSER • The word ‘Parsing’ whose origin is from Latin word ‘pars’ (which means ‘part’), is used to draw exact meaning or dictionary meaning from the text. • It is also called Syntactic analysis or syntax analysis. • Comparing the rules of formal grammar, syntax analysis checks the text for meaningfulness. • The sentence like “Give me hot ice-cream”, for example, would be rejected by parser or syntactic analyzer. Prof. Jayanand Kamble 47
  • 48. •Parser (Def): • The process of analyzing the strings of symbols in natural language conforming to the rules of formal grammar. Prof. Jayanand Kamble 48
  • 49. RELEVANCE OF PARSING IN NLP • Parser is used to report any syntax error. • It helps to recover from commonly occurring error so that the processing of the remainder of program can be continued. • Parse tree is created with the help of a parser. • Parser is used to create symbol table, which plays an important role in NLP. • Parser is also used to produce intermediate representations (IR). Prof. Jayanand Kamble 49
  • 50. • Parsing Strategy (Driven by grammar) • S-> NP VP • NP-> N | PRON • VP-> V NP | V PP • N-> Mangoes • PRON-> I • V-> like Prof. Jayanand Kamble 50
  • 51. I LIKE MANGOES PRO V NOUN Prof. Jayanand Kamble 51
  • 52. •Nltk for NLP •“A cleaver fox was jumping over the wall” Prof. Jayanand Kamble 52
  • 53. CHALLENGES IN SYNTACTIC PROCESSING: STRUCTURAL AMBIGUITY • Preposition Phrase Attachment • I saw the boy with a telescope. (who has the telescope?) • I saw the mountain with a telescope. (world knowledge: mountain cannot be an instrument of seeing) • I saw the boy with the pony tail. ( world knowledge : pony tail cannot be an instrument of Prof. Jayanand Kamble 53
  • 54. SEMANTIC TASKS • Representation in terms of • Predicate calculus/Semantic Nets/Frames/Conceptual Dependencies and Scripts • John gave a book to Mary • Give: action, Agent: John, Object: Book, Recipient: Mary • Challenge : ambiguity in semantic role labeling • (Eng ) Visiting aunts can be a nuisance • (Hin ) aapko mujhe mithaai khilaanii padegii …… ambiguous in Marathi too. Prof. Jayanand Kamble 54
  • 55. PREDICATE CALCULUS • A predicate is an expression of one or more variables defined on some specific domain. • A predicate with variables can be made a proposition by either assigning a value to the variable or by quantifying the variable. Prof. Jayanand Kamble 55
  • 56. • Consider the following statement. Ram is a student. • Now consider the above statement in terms of Predicate calculus. Here "is a student" is a predicate and Ram is subject. Prof. Jayanand Kamble 56
  • 57. • Let's denote "Ram" as x and "is a student" as a predicate P then we can write the above statement as P(x). • Generally, a statement expressed by Predicate must have at least one object associated with Predicate. • In this case, Ram is the required object associated with predicate P. Prof. Jayanand Kamble 57
  • 58. WORD SENSE DISAMBIGUATION (WSD) • Words in natural language usually have a fair number of different possible meanings • Ravi has a strong interest in computer science. • Ravi pays a large amount of interest on his credit card. • For many tasks (such as question answering, translation) the proper sense of each ambiguous word in a sentence must be determined. Prof. Jayanand Kamble 58
  • 60. PRAGMATICS • Very hard problem • Model user intention • Tourist (in a hurry, checking out of the hotel, motioning to the service boy): Boy, go upstairs and see if my sandals are under the divan. Do not be late. I just have 15 minutes to catch the train. • Boy (running upstairs and coming back panting): yes sir, they are there. Prof. Jayanand Kamble 60 व्यािहाररकता
  • 61. DISCOURSE • Processing of sequence of sentences. • Mother to John: “John go to school. It is open today. Should you bunk? Father will be very angry.” • Ambiguity of • open • bunk what? • Why will the father be angry? • Complex chain of reasoning and application of world knowledge • Ambiguity of father • father as parent or father as headmaster Prof. Jayanand Kamble 61 प्रिचन
  • 62. • Determine which phrases in a document refer to the same underlying entity • John put the carrot on the plate and ate it Prof. Jayanand Kamble 62
  • 63. APPLICATION OF NLP Prof. Jayanand Kamble 63
  • 64. INFORMATION EXTRACTION (IE) • Identify phrases in language that refer to specific types of entities and relations in text • Named entity recognition is task of identifying names of people, places, organizations, etc. in text. Michael Dell is the CEO of Dell Computer Corporation and lives in AustinTexas. • Relation extraction identifies specific relations between entities. Michael Dell is the CEO of Dell Computer Corporation and lives in AustinTexas. Prof. Jayanand Kamble 64 peopl e organizatio ns places
  • 65. QUESTION ANSWERING • Directly answer natural language questions based on information presented in a corpora of textual documents (e. g. the web) • When was Barack Obama born? • August 4, 1961 • Who was president when Barack Obama was born? • John F. Kennedy • How many presidents have there been since Barack Obama was born? • 9 Prof. Jayanand Kamble 65
  • 66. SENTIMENT ANALYSIS • Extract subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. • Especially useful for identifying trends of public opinion in the social media, for the purpose of marketing. Prof. Jayanand Kamble 66
  • 67. MACHINE TRANSLATION (MT) • Translate a sentence from one natural language to another. • Ex. • English: I would like to take admission in MGM University • Marathi: मला एमजीएम विद्यापीठात प्रिेश घ्यायचा आहे • Hindi: मैं एमजीएम विश्वविद्यालय में प्रिेश लेना चाहता हां Prof. Jayanand Kamble 67
  • 68. AMBIGUITY RESOLUTION IS REQUIRED FOR TRANSLATION • Syntactic and semantic ambiguities must be properly resolved for correct translation: • “Jak plays the guitar.” → " जैक वगिार बजाता है।" • “Jak plays soccer.” → " जैक र्फ ु िबॉल खेलता है।" Prof. Jayanand Kamble 68
  • 70. EARLY HISTORY: 1950’S • Shannon (the father of information theory) explored probabilistic models of natural language (1951) • Chomsky (the extremely influential linguist) developed formal models of syntax, i.e. finite state and context-free grammars (1956) • First computational parser developed at UPenn as a cascade of finite-state transducers (Joshi, 1961; Harris, 1962) • Bayesian methods developed for optical character recognition (OCR) (Bledsoe & Browning, 1959) Prof. Jayanand Kamble 70
  • 71. HISTORY: 1960’S • Work at MIT AI lab on question answering (BASEBALL) and dialog (ELIZA) • Semantic network models of language for question answering (Simmons, 1965). • First electronic corpus collected, Brown corpus, 1 million words (Kucera and Francis, 1967) • Bayesian methods used to identify document authorship (The Federalist papers) (Mosteller &Wallace, 1964) Prof. Jayanand Kamble 71
  • 72. HISTORY: 1970’S • “Natural language understanding” systems developed that tried to support deeper semantic interpretation • SHRDLU (Winograd, 1972) performs tasks in the “blocks world” based on NL instruction • Schank et al. (1972, 1977) developed systems for conceptual representation of language and for understanding short stories using hand-coded knowledge of scripts, plans, and goals. • Prolog programming language developed to support logic-based parsing (Colmeraurer, 1975). • Initial development of hidden Markov models (HMMs) for statistical speech recognition (Baker, 1975; Jelinek, 1976) Prof. Jayanand Kamble 72
  • 73. HISTORY: 1980’S • Development of more complex (mildly context sensitive) grammatical formalisms, e. g. unification grammar, tree- adjoning grammar etc. • Symbolic work on discourse processing and NL generation. • Initial use of statistical (HMM) methods for syntactic analysis (POS tagging) (Church, 1988). Prof. Jayanand Kamble 73
  • 74. HISTORY: 1990’S • Rise of statistical methods and empirical evaluation causes a “scientific revolution” in the field. • Initial annotated corpora developed for training and testing systems for POS tagging, parsing, WSD, information extraction, MT, etc. • First statistical machine translation systems developed at IBM for Canadian Hansards corpus (Brown et al., 1990) • First robust statistical parsers developed (Magerman, 1995; Collins, 1996; Charniak, 1997) • First systems for robust information extraction developed (e.g. MUC competitions) Prof. Jayanand Kamble 74
  • 75. HISTORY: 2000 ONWARDS • Information extraction from social networks • Information retrieval • Cross-lingual information access • Machine Translation (statistical, hybrid etc.) • Biomedical text mining • Discourse processing Prof. Jayanand Kamble 75
  • 76. NLP VS PLP Prof. Jayanand Kamble 76
  • 77. Prof. Jayanand Kamble 77 NLP PLP domain of discourse broad: what can be expressed narrow: what can be computed lexicon large/complex small/simple grammatical constructs many and varied - declarative - interrogative - fragments etc. few - declarative - imperative meanings of an expression many one tools and techniques morphological analysis syntactic analysis semantic analysis integration of world knowledge lexical analysis context-free parsing code generation/compiling interpreting
  • 78. UNIT 01 END Prof. Jayanand Kamble 78

Editor's Notes

  1. Whenever you search something on Google, after typing 2-3 letters, it shows you the possible search terms. Or, if you search for something with typos, it corrects them and still finds relevant results for you. Isn’t it amazing? Have you ever used Google Translate to find out what a particular word or phrase is in a different language? I’m sure it’s a YES!!