SlideShare a Scribd company logo
1 of 16
Introduction
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Introduction
• The goal of this new field is to get computers to perform useful
tasks involving human language, tasks like enabling human-machine
communication, improving human-human communication, or
simply doing useful processing of text or speech.
– E.g CONVERSATIONAL agent. The HAL 9000 computer
– The goal of machine translation is to automatically translate a
document from one language to another. Machine translation is far
from a solved problem; we will cover the algorithms currently used in
the field, as well as important component tasks.
– Web-based question answering. This is a generalization of simple web
search, where instead of just typing keywords a user might ask
complete questions, ranging from easy to hard. Answering done based
on inference (drawing conclusions based on known facts).
Knowledge in Speech and Language
• HAL must be able to recognize words from an audio signal
and to generate an audio signal from a sequence of words.
• These tasks of speech recognition and speech synthesis
tasks require knowledge about
– Phonetics - how words are pronounced in terms of sequences
of sounds
– Phonology - how each of these sounds is realized acoustically.
• Morphology, the way words break down into component
parts that carry meanings like singular versus plural.
• The knowledge needed to order and group words together
comes under the heading of Syntax.
Knowledge in Speech and Language
(Cont…)
• Now consider a question answering system dealing with the
following question:
How much Chinese silk was exported to Western Europe
by the end of the 18th century?
• In order to answer this question we need to know something about
lexical semantics, the meaning of all the words (export, or silk) as
well as compositional semantics (What exactly constitutes Western
Europe as opposed to Eastern or Southern Europe, what does end
mean when combined with the 18th century.)
• We also need to know something about the relationship of the
words to the syntactic structure.
• For example we need to know that by the end of the 18th century is
a temporal end-point, and not a description of the agent.
Knowledge in Speech and Language
(Cont…)
• Another kind of pragmatic or discourse knowledge is
required to answer the question
How many states were in the United States
that year?
• What year is that year? In order to interpret words like
that year a question answering system need to
examine the earlier questions that were asked; in this
case the previous question talked about the year that
Lincoln was born.
• Thus this task of co-reference resolution makes use of
knowledge about how words like that or pronouns like
it or she refer to previous parts of the discourse.
Knowledge in Speech and Language
(Cont…)
• To summarize, engaging in complex language behavior
requires various kinds of knowledge of language:
– Phonetics and Phonology— knowledge about linguistic
sounds.
– Morphology— knowledge of the meaningful components
of words.
– Syntax— knowledge of the structural relationships
between words.
– Semantics—knowledge of meaning.
– Pragmatics— knowledge of the relationship of meaning to
the goals and intentions of the speaker.
– Discourse— knowledge about linguistic units larger than a
single utterance.
Ambiguity
• We say some input is ambiguous if there are multiple
alternative linguistic structures that can be built for it.
• Consider the spoken sentence I made her duck.
• Here’s five different meanings this sentence could have
(see if you can think of some more), each of which
exemplifies an ambiguity at some level:
– I cooked waterfowl for her.
– I cooked waterfowl belonging to her.
– I created the (plaster?) duck she owns.
– I caused her to quickly lower her head or body.
– I waved my magic wand and turned her into
undifferentiated waterfowl.
Ambiguity (Cont…)
• These different meanings are caused by a number of ambiguities.
• First, the words duck and her are morphologically or syntactically
ambiguous in their part-of-speech.
• Duck can be a verb or a noun, while her can be a dative pronoun or
a possessive pronoun.
• Second, the word make is semantically ambiguous; it can mean
create or cook.
• Finally, the verb make is syntactically ambiguous in a different way.
Make can be transitive, that is, taking a single direct object (1.2), or
it can be ditransitive, that is, taking two objects (1.5), meaning that
the first object (her) got made into the second object (duck).
• Finally, make can take a direct object and a verb (1.4), meaning that
the object (her) got caused to perform the verbal action (duck).
Ambiguity (Cont…)
• For example deciding whether duck is a verb or a noun
can be solved by part-of-speech tagging. Deciding
whether make means “create” or “cook” can be solved
by word sense disambiguation.
• Resolution of part-of-speech and word sense
ambiguities are two important kinds of lexical
disambiguation.
• Deciding whether her and duck are part of the same
entity (as in (1.1) or (1.4)) or are different entity (as in
(1.2)) is an example of Syntactic disambiguation can be
addressed by probabilistic parsing.
Models & Algorithms
• Most important models are state machines, rule
systems, logic, probabilistic models, and vector-space
models.
• These models, in turn, lend themselves to a small
number of algorithms, among the most important of
which are state space search algorithms such as
dynamic programming, and machine learning
algorithms such as classifiers and EM and other
learning algorithms.
• State machines and formal rule systems are the main
tools used when dealing with knowledge of phonology,
morphology, and syntax.
Models & Algorithms (Cont…)
• The third model that plays a critical role in
capturing knowledge of language is logic.
• We will discuss first order logic, also known as the
predicate calculus, as well as such related
formalisms as lambda-calculus, feature-
structures, and semantic primitives.
• These logical representations have traditionally
been used for modeling semantics and
pragmatics, although more recent work has
focused on more robust techniques drawn from
non-logical lexical semantics.
Models & Algorithms (Cont…)
• Probabilistic models are crucial for capturing every kind of linguistic
knowledge.
• Each of the other models (state machines, formal rule systems, and logic)
can be augmented with probabilities. For example the state machine can
be augmented with probabilities to become the weighted automaton or
Markov model.
• Hidden Markov models or HMMs, which are used everywhere in the field,
in part-of-speech tagging, speech recognition, dialogue understanding,
text-to-speech, and machine translation.
• The key advantage of probabilistic models is their ability to solve the many
kinds of ambiguity problems that we discussed earlier; almost any speech
and language processing problem can be recast as:
• “given N choices for some ambiguous input, choose the most probable
one”.
• Finally, vector-space models, based on linear algebra, underlie
information retrieval and many treatments of word meanings.
Models & Algorithms (Cont…)
• Processing language using any of these models typically involves a
search through a space of states representing hypotheses about an
input.
• In speech recognition, we search through a space of phone
sequences for the correct word.
• In parsing, we search through a space of trees for the syntactic
parse of an input sentence.
• In machine translation, we search through a space of translation
hypotheses for the correct translation of a sentence into another
language.
• For non-probabilistic tasks, such as state machines, we use well-
known graph algorithms such as depth-first search.
• For probabilistic tasks, we use heuristic variants such as best-first
and A* search, and rely on dynamic programming algorithms for
computational tractability.
Models & Algorithms (Cont…)
• For many language tasks, we rely on machine learning
tools like classifiers and sequence models.
• Classifiers like decision trees, support vector
machines, Gaussian Mixture Models and logistic
regression are very commonly used.
• A hidden Markov model is one kind of sequence
model; other are Maximum Entropy Markov Models
or Conditional Random Fields.
• Another tool that is related to machine learning is
methodological; the use of distinct training and test
sets, statistical techniques like cross-validation, and
careful evaluation of our trained systems
The State of the Art
• Travelers calling Amtrak, United Airlines and other travel-
providers interact with conversational agents that guide
them through the process of making reservations and
getting arrival and departure information.
• Luxury car makers such as Mercedes-Benz models provide
automatic speech recognition and text-to-speech systems
that allow drivers to control their environmental,
entertainment and navigational systems by voice. A similar
spoken dialogue system has been deployed by astronauts
on the International Space Station .
• Blinkx, and other video search companies, provide search
services for million of hours of video on the Web by using
speech recognition technology to capture the words in the
sound track.
The State of the Art (Cont…)
• Google provides cross-language information retrieval and translation
services where a user can supply queries in their native language to search
collections in another language. Google translates the query, finds the
most relevant pages and then automatically translates them back to the
user’s native language.
• Interactive tutors, based on lifelike animated characters, serve as tutors
for children learning to read, and as therapists for people dealing with
aphasia and Parkinsons disease.
• Text analysis companies such as Nielsen Buzzmetrics, Umbria, and
Collective Intellect, provide marketing intelligence based on automated
measurements of user opinions, preferences, attitudes as expressed in
weblogs, discussion forums and user groups.
• Large educational publishers such as Pearson, as well as testing services
like ETS, use automated systems to analyze thousands of student essays,
grading and assessing them in a manner that is indistinguishable from
human graders.

More Related Content

What's hot

Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
Rushdi Shams
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
Parisa Niksefat
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
Seid Hassen
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
Mohammed Raiyah
 

What's hot (19)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
Error Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation OutputsError Analysis of Rule-based Machine Translation Outputs
Error Analysis of Rule-based Machine Translation Outputs
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 
Processing Written English
Processing Written EnglishProcessing Written English
Processing Written English
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Spotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationSpotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human Translation
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
 
AI_ 3 & 4 Knowledge Representation issues
AI_ 3 & 4 Knowledge Representation issuesAI_ 3 & 4 Knowledge Representation issues
AI_ 3 & 4 Knowledge Representation issues
 

Similar to NLP_KASHK: Introduction

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
tanishamahajan11
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 

Similar to NLP_KASHK: Introduction (20)

REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
mt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPTmt_cat_presentations CAT TRANSLATION PPT
mt_cat_presentations CAT TRANSLATION PPT
 
Computational linguistics
Computational linguistics Computational linguistics
Computational linguistics
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffffnlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
SMT3
SMT3SMT3
SMT3
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
AI Intro to NLP_Module 3.ppt
AI Intro to NLP_Module 3.pptAI Intro to NLP_Module 3.ppt
AI Intro to NLP_Module 3.ppt
 
Lesson 41.pdf
Lesson 41.pdfLesson 41.pdf
Lesson 41.pdf
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 

More from Hemantha Kulathilake

More from Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram ModelsNLP_KASHK:Smoothing N-gram Models
NLP_KASHK:Smoothing N-gram Models
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 

Recently uploaded

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 

NLP_KASHK: Introduction

  • 2. Introduction • The goal of this new field is to get computers to perform useful tasks involving human language, tasks like enabling human-machine communication, improving human-human communication, or simply doing useful processing of text or speech. – E.g CONVERSATIONAL agent. The HAL 9000 computer – The goal of machine translation is to automatically translate a document from one language to another. Machine translation is far from a solved problem; we will cover the algorithms currently used in the field, as well as important component tasks. – Web-based question answering. This is a generalization of simple web search, where instead of just typing keywords a user might ask complete questions, ranging from easy to hard. Answering done based on inference (drawing conclusions based on known facts).
  • 3. Knowledge in Speech and Language • HAL must be able to recognize words from an audio signal and to generate an audio signal from a sequence of words. • These tasks of speech recognition and speech synthesis tasks require knowledge about – Phonetics - how words are pronounced in terms of sequences of sounds – Phonology - how each of these sounds is realized acoustically. • Morphology, the way words break down into component parts that carry meanings like singular versus plural. • The knowledge needed to order and group words together comes under the heading of Syntax.
  • 4. Knowledge in Speech and Language (Cont…) • Now consider a question answering system dealing with the following question: How much Chinese silk was exported to Western Europe by the end of the 18th century? • In order to answer this question we need to know something about lexical semantics, the meaning of all the words (export, or silk) as well as compositional semantics (What exactly constitutes Western Europe as opposed to Eastern or Southern Europe, what does end mean when combined with the 18th century.) • We also need to know something about the relationship of the words to the syntactic structure. • For example we need to know that by the end of the 18th century is a temporal end-point, and not a description of the agent.
  • 5. Knowledge in Speech and Language (Cont…) • Another kind of pragmatic or discourse knowledge is required to answer the question How many states were in the United States that year? • What year is that year? In order to interpret words like that year a question answering system need to examine the earlier questions that were asked; in this case the previous question talked about the year that Lincoln was born. • Thus this task of co-reference resolution makes use of knowledge about how words like that or pronouns like it or she refer to previous parts of the discourse.
  • 6. Knowledge in Speech and Language (Cont…) • To summarize, engaging in complex language behavior requires various kinds of knowledge of language: – Phonetics and Phonology— knowledge about linguistic sounds. – Morphology— knowledge of the meaningful components of words. – Syntax— knowledge of the structural relationships between words. – Semantics—knowledge of meaning. – Pragmatics— knowledge of the relationship of meaning to the goals and intentions of the speaker. – Discourse— knowledge about linguistic units larger than a single utterance.
  • 7. Ambiguity • We say some input is ambiguous if there are multiple alternative linguistic structures that can be built for it. • Consider the spoken sentence I made her duck. • Here’s five different meanings this sentence could have (see if you can think of some more), each of which exemplifies an ambiguity at some level: – I cooked waterfowl for her. – I cooked waterfowl belonging to her. – I created the (plaster?) duck she owns. – I caused her to quickly lower her head or body. – I waved my magic wand and turned her into undifferentiated waterfowl.
  • 8. Ambiguity (Cont…) • These different meanings are caused by a number of ambiguities. • First, the words duck and her are morphologically or syntactically ambiguous in their part-of-speech. • Duck can be a verb or a noun, while her can be a dative pronoun or a possessive pronoun. • Second, the word make is semantically ambiguous; it can mean create or cook. • Finally, the verb make is syntactically ambiguous in a different way. Make can be transitive, that is, taking a single direct object (1.2), or it can be ditransitive, that is, taking two objects (1.5), meaning that the first object (her) got made into the second object (duck). • Finally, make can take a direct object and a verb (1.4), meaning that the object (her) got caused to perform the verbal action (duck).
  • 9. Ambiguity (Cont…) • For example deciding whether duck is a verb or a noun can be solved by part-of-speech tagging. Deciding whether make means “create” or “cook” can be solved by word sense disambiguation. • Resolution of part-of-speech and word sense ambiguities are two important kinds of lexical disambiguation. • Deciding whether her and duck are part of the same entity (as in (1.1) or (1.4)) or are different entity (as in (1.2)) is an example of Syntactic disambiguation can be addressed by probabilistic parsing.
  • 10. Models & Algorithms • Most important models are state machines, rule systems, logic, probabilistic models, and vector-space models. • These models, in turn, lend themselves to a small number of algorithms, among the most important of which are state space search algorithms such as dynamic programming, and machine learning algorithms such as classifiers and EM and other learning algorithms. • State machines and formal rule systems are the main tools used when dealing with knowledge of phonology, morphology, and syntax.
  • 11. Models & Algorithms (Cont…) • The third model that plays a critical role in capturing knowledge of language is logic. • We will discuss first order logic, also known as the predicate calculus, as well as such related formalisms as lambda-calculus, feature- structures, and semantic primitives. • These logical representations have traditionally been used for modeling semantics and pragmatics, although more recent work has focused on more robust techniques drawn from non-logical lexical semantics.
  • 12. Models & Algorithms (Cont…) • Probabilistic models are crucial for capturing every kind of linguistic knowledge. • Each of the other models (state machines, formal rule systems, and logic) can be augmented with probabilities. For example the state machine can be augmented with probabilities to become the weighted automaton or Markov model. • Hidden Markov models or HMMs, which are used everywhere in the field, in part-of-speech tagging, speech recognition, dialogue understanding, text-to-speech, and machine translation. • The key advantage of probabilistic models is their ability to solve the many kinds of ambiguity problems that we discussed earlier; almost any speech and language processing problem can be recast as: • “given N choices for some ambiguous input, choose the most probable one”. • Finally, vector-space models, based on linear algebra, underlie information retrieval and many treatments of word meanings.
  • 13. Models & Algorithms (Cont…) • Processing language using any of these models typically involves a search through a space of states representing hypotheses about an input. • In speech recognition, we search through a space of phone sequences for the correct word. • In parsing, we search through a space of trees for the syntactic parse of an input sentence. • In machine translation, we search through a space of translation hypotheses for the correct translation of a sentence into another language. • For non-probabilistic tasks, such as state machines, we use well- known graph algorithms such as depth-first search. • For probabilistic tasks, we use heuristic variants such as best-first and A* search, and rely on dynamic programming algorithms for computational tractability.
  • 14. Models & Algorithms (Cont…) • For many language tasks, we rely on machine learning tools like classifiers and sequence models. • Classifiers like decision trees, support vector machines, Gaussian Mixture Models and logistic regression are very commonly used. • A hidden Markov model is one kind of sequence model; other are Maximum Entropy Markov Models or Conditional Random Fields. • Another tool that is related to machine learning is methodological; the use of distinct training and test sets, statistical techniques like cross-validation, and careful evaluation of our trained systems
  • 15. The State of the Art • Travelers calling Amtrak, United Airlines and other travel- providers interact with conversational agents that guide them through the process of making reservations and getting arrival and departure information. • Luxury car makers such as Mercedes-Benz models provide automatic speech recognition and text-to-speech systems that allow drivers to control their environmental, entertainment and navigational systems by voice. A similar spoken dialogue system has been deployed by astronauts on the International Space Station . • Blinkx, and other video search companies, provide search services for million of hours of video on the Web by using speech recognition technology to capture the words in the sound track.
  • 16. The State of the Art (Cont…) • Google provides cross-language information retrieval and translation services where a user can supply queries in their native language to search collections in another language. Google translates the query, finds the most relevant pages and then automatically translates them back to the user’s native language. • Interactive tutors, based on lifelike animated characters, serve as tutors for children learning to read, and as therapists for people dealing with aphasia and Parkinsons disease. • Text analysis companies such as Nielsen Buzzmetrics, Umbria, and Collective Intellect, provide marketing intelligence based on automated measurements of user opinions, preferences, attitudes as expressed in weblogs, discussion forums and user groups. • Large educational publishers such as Pearson, as well as testing services like ETS, use automated systems to analyze thousands of student essays, grading and assessing them in a manner that is indistinguishable from human graders.