This document provides an introduction and background on natural language processing (NLP). It discusses the key categories of linguistic knowledge needed for NLP, including phonetics, morphology, syntax, semantics, pragmatics, and discourse. It also explains that NLP tasks involve resolving ambiguity at these different levels of language. Common models and algorithms used in NLP are described, such as state machines, formal rule systems, logic, and probabilistic models. Machine learning approaches are also discussed for automatically learning NLP representations.
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
The presentation explains topics on study of language, applications on natural language processing, levels of language analysis, representation and understanding, linguistic background and elements of a simple noun phrase
Hidden markov model based part of speech tagger for sinhala languageijnlc
In this paper we present a fundamental lexical semantics of Sinhala language and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala language. In any Natural Language processing task, Part of Speech is a very vital topic, which involves analysing of the construction, behaviour and the dynamics of the language, which the knowledge could utilized in computational linguistics analysis and automation applications. Though Sinhala is a morphologically rich and agglutinative language, in which words are inflected with various grammatical features, tagging is very essential for further analysis of the language. Our research is based on statistical based approach, in which the tagging process is done by computing the tag sequence probability and the word-likelihood probability from the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. The current tagger could reach more than 90% of accuracy for known words.
The presentation explains topics on study of language, applications on natural language processing, levels of language analysis, representation and understanding, linguistic background and elements of a simple noun phrase
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
Psychology of Language 5th Edition Carroll Test BankKiayadare
Full download : http://alibabadownload.com/product/psychology-of-language-5th-edition-carroll-test-bank/ Psychology of Language 5th Edition Carroll Test Bank
Natural Language Processing(NLP) is a subset Of AI.It is the ability of a computer program to understand human language as it is spoken.
Contents
What Is NLP?
Why NLP?
Levels In NLP
Components Of NLP
Approaches To NLP
Stages In NLP
NLTK
Setting Up NLP Environment
Some Applications Of NLP
Words can have more than one distinct meaning and many words can be interpreted in multiple ways
depending on the context in which they occur. The process of automatically identifying the meaning of
a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This
phenomenon poses challenges to Natural Language Processing systems. There have been many efforts
on word sense disambiguation for English; however, the amount of efforts for Amharic is very little.
Many natural language processing applications, such as Machine Translation, Information Retrieval,
Question Answering, and Information Extraction, require this task, which occurs at the semantic level.
In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet
is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations
among words and senses. The proposed system consists of preprocessing, morphological analysis and
disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the
input sentence for morphological analysis and morphological analysis is used to reduce various forms
of a word to a single root or stem word. Amharic WordNet contains words along with its different
meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is
used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a
sentence using Amharic WordNet by using sense overlap and related words.
We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic
WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic
WordNet with and without morphological analyzer and the second one is determining an optimal
windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic
WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%,
respectively. In the second experiment, we have found that two-word window on each side of the
ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD
methods have performed better than previous Amharic WSD methods.
Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation,
Knowledge Based Approach, Lesk Algorithm
Spotting The Difference–Machine Versus Human TranslationUlatus
Regardless of how much the systems have improved and made worldwide communication easier, there is still no alternative to human translation. Machines can only comply to grammatical accuracy, but the semantic, linguistic, and the cultural completeness in a text can only be achieved by human speakers
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
Psychology of Language 5th Edition Carroll Test BankKiayadare
Full download : http://alibabadownload.com/product/psychology-of-language-5th-edition-carroll-test-bank/ Psychology of Language 5th Edition Carroll Test Bank
Natural Language Processing(NLP) is a subset Of AI.It is the ability of a computer program to understand human language as it is spoken.
Contents
What Is NLP?
Why NLP?
Levels In NLP
Components Of NLP
Approaches To NLP
Stages In NLP
NLTK
Setting Up NLP Environment
Some Applications Of NLP
Words can have more than one distinct meaning and many words can be interpreted in multiple ways
depending on the context in which they occur. The process of automatically identifying the meaning of
a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This
phenomenon poses challenges to Natural Language Processing systems. There have been many efforts
on word sense disambiguation for English; however, the amount of efforts for Amharic is very little.
Many natural language processing applications, such as Machine Translation, Information Retrieval,
Question Answering, and Information Extraction, require this task, which occurs at the semantic level.
In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet
is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations
among words and senses. The proposed system consists of preprocessing, morphological analysis and
disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the
input sentence for morphological analysis and morphological analysis is used to reduce various forms
of a word to a single root or stem word. Amharic WordNet contains words along with its different
meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is
used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a
sentence using Amharic WordNet by using sense overlap and related words.
We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic
WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic
WordNet with and without morphological analyzer and the second one is determining an optimal
windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic
WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%,
respectively. In the second experiment, we have found that two-word window on each side of the
ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD
methods have performed better than previous Amharic WSD methods.
Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation,
Knowledge Based Approach, Lesk Algorithm
Spotting The Difference–Machine Versus Human TranslationUlatus
Regardless of how much the systems have improved and made worldwide communication easier, there is still no alternative to human translation. Machines can only comply to grammatical accuracy, but the semantic, linguistic, and the cultural completeness in a text can only be achieved by human speakers
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
4. Background
• How the syllabus is framed?
– Starts with word and its components
– Then how the word fits together – syntax
– To the meaning of words, phrases, sentences – semantics
– Issues of coherent texts, dialog and translation
• Technologies to cover theory:
– Regular expressions,
– IR,
– Context Free Grammar,
– unification, First order predicate calculus,
– Hidden Markov and other probabilistic models
5. 5
Background - Goal of NLP
• Develop techniques and tools to build
practical and robust systems that can
communicate with users in one or more
natural language
Natural Lang. Artificial Lang.
Lexical >100 000 words ~100 words
Syntax Complex Simple
Semantic 1 word --> several
meanings
1 word --> 1 meaning
6. Background
• What we mean by “natural language” ?
• A language that is used for everyday communication by
humans; Ex: English, Hindi
• In contrast to programming languages and mathematical
notations:
– natural languages have evolved as they pass from generation to
generation
– and are hard to pin down with explicit rules.
7. Background
• What we mean by “Natural Language Processing” ?
• Natural Language Processing—or NLP for short—in a wide
sense covers any kind of computer manipulation of natural
language.
– NLP is the branch of computer science focused on developing systems
that allow computers to communicate with people using everyday
language.
– Also called Computational Linguistics
– Also concerns how computational methods can aid the understanding
of human language
– it could be as simple as counting word frequencies to compare
different writing styles.
– At the other extreme, NLP involves “understanding” complete human
utterances, at least to the extent of being able to give useful responses
to them.
8. Background
• Technologies based on NLP are becoming increasingly
widespread.
– Ex: phones and handheld computers support predictive text and
handwriting recognition;
– web search engines give access to information locked up in
unstructured text;
– Machine translation allows to retrieve texts written in Chinese and
read them in Spanish.
– providing more natural human-machine interfaces,
– and more sophisticated access to stored information,
• language processing has come to play a central role in the
multilingual information society.
9. Background
• Linguistics
• 100 years of history as a scientific discipline
• Computational Linguistics
• 40 year history as a part of CS
• Language understanding
• Since last 15 years, emerged as an industry reaching millions
of people with
– IR and ML available on the internet
– Speech recognition on computers
10. Background
• How the course is related to other Dept Courses?
• traditionally taught in different courses in different
departments:
– speech recognition in electrical engineering depts
– parsing, semantic interpretation, and pragmatics in natural language
processing courses in computer science departments,
– Computational morphology and phonology in computational
linguistics courses in linguistics departments
11. Forms of Natural Language
• The input/output of a NLP system can be:
– written text: newspaper articles, letters, manuals, prose, …
– Speech: read speech (radio, TV, dictations), conversational
speech, commands, …
• To process written text, we need:
– lexical,
– syntactic,
– Semantic
knowledge about the language
– discourse information,
– real world knowledge
12. Forms of Natural Language
• To process written text, we need:
– lexical, syntactic, semantic knowledge about the language
– discourse information, real world knowledge
• To process spoken language, we need
– everything above
plus
– speech recognition
– speech synthesis
13. Technologies
• Speech recognition
– Spoken language is recognized and
transformed in into text as in
dictation systems, into commands
as in robot control systems, or into
some other internal representation.
• Speech synthesis
– Utterances in spoken language are
produced from text (text-to-speech
systems) or from internal
representations of words or
sentences (concept-to-speech
systems)
14. Components of NLP
• Natural Language Understanding
– Mapping the given input in the natural language into a useful
representation.
– Different level of analysis required:
morphological analysis,
syntactic analysis,
semantic analysis,
discourse analysis, …
• Natural Language Generation
– Producing output in the natural language from some internal
representation.
– Different level of synthesis required:
• deep planning (what to say),
• syntactic generation
16. Introduction
NLP Applications
1. Mundane applications (Simple) such as word counting and
automatic hyphenation, spelling correction, text
categorization
2. Cutting edge applications (Complex) such as automated
question answering on the Web, and real-time spoken
language translation, speech recognition, machine
translation, information extraction, sentiment analysis
What distinguishes these language processing applications from
other data processing systems ?
• is their use of knowledge of language
17.
18. Introduction
• Language processing applications vs. other data processing
systems
• Ex: Unix wc program, to count the total number of bytes,
words, and lines in a text file.
• When used to count bytes and lines, wc is an ordinary data
processing application.
• However, when it is used to count the words in a file it
requires knowledge about what it means to be a word,
• and thus becomes a language processing system.
19. 1.1 Knowledge in Speech and
Language Processing
Natural Language Processing Tasks
20. Natural Language Processing Tasks
• Processing natural language text involves
various syntactic, semantic and pragmatic
tasks in addition to other problems.
21. 1.1 Knowledge in Speech and Language Processing- 6
Categories of Linguistic Knowledge
Acoustic/
Phonetic
Syntax Semantics Pragmatics
words parse
trees
literal
meaning
meaning
(contextualized)
sound
waves
22. 22
1.1 Knowledge in Speech and Language Processing- 6 Categories
of Linguistic Knowledge
1. Phonetics and Phonology — The study of linguistic sounds
2. Morphology —The study of the meaningful components of
words
3. Syntax —The study of the structural relationships between
words
4. Semantics — The study of meaning
5. Pragmatics — The study of how language is used to
accomplish goals
6. Discourse—The study of linguistic units larger than a single
utterance
23. 1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• The tasks of analyzing an incoming audio signal and
• recovering the exact sequence of words and generating its
response
• require knowledge about phonetics and phonology,
• which can help model how words are pronounced in
colloquial (used in ordinary or familiar conversation; not
formal or literary ) speech (Chapters 4 and 5).
24. 1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Producing and recognizing the variations of individual
words (e.g., recognizing that doors is plural)
• requires knowledge about morphology,
• which captures information about the shape and behavior
of words in context (Chapters 2 and 3).
25. 1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Syntax: the knowledge needed to order and group words
together
HAL, the pod bay door is open.
HAL, is the pod bay door open?
I’m I do, sorry that afraid Dave I’m can’t.
(Dave, I’m sorry I’m afraid I can’t do that.)
26. 1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Lexical semantics: knowledge of the meanings of the
component words
• Compositional semantics: knowledge of how these
components combine to form larger meanings
– To know that Dave’s command is actually about
opening the pod bay door, rather than an inquiry about
the day’s lunch menu.
27. Word Sense Disambiguation (WSD)
• Words in natural language usually have a fair number of
different possible meanings.
– Ellen has a strong interest in computational linguistics.
– Ellen pays a large amount of interest on her credit card.
• For many tasks (question answering, translation), the
proper sense of each ambiguous word in a sentence must
be determined.
28. 28
1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Pragmatics: the appropriate use of the kind of polite and
indirect language
No or
No, I won’t open the door.
I’m sorry, I’m afraid, I can’t.
I won’t.
29. 29
1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• discourse conventions: knowledge of correctly structuring
these such conversations
– HAL chooses to engage in a structured conversation
relevant to Dave’s initial request. HAL’s correct use of
the word that in its answer to Dave’s request is a simple
illustration of the kind of between-utterance device
common in such conversations.
Dave, I’m sorry I’m afraid I can’t do that.
30. 30
1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Phonology – concerns how words are related to the sounds that
realize them.
• Morphology – concerns how words are constructed from more
basic meaning units called morphemes. A morpheme is the primitive
unit of meaning in a language.
• Syntax – concerns how words can be put together to form correct
sentences and determines what structural role each word plays in the
sentence and what phrases are subparts of other phrases.
• Semantics – concerns what words mean and how these meaning
combine in sentences to form sentence meaning. The study of
context-independent meaning.
31. 31
1.1 Knowledge in Speech and Language
Processing- 6 Categories of Linguistic Knowledge
• Pragmatics – concerns how sentences are used in different situations
and how use affects the interpretation of the sentence.
• Discourse – concerns how the immediately preceding sentences
affect the interpretation of the next sentence.For example, interpreting
pronouns and interpreting the temporal aspects of the information.
• World Knowledge – includes general knowledge about the world.
What each language user must know about the other’s beliefs and
goals.
33. 33
1.2 Ambiguity
• A perhaps surprising fact about the six categories of linguistic
knowledge is that most or all tasks in speech and language processing
can be viewed as resolving ambiguity at one of these levels.
• We say some input is ambiguous
– if there are multiple alternative linguistic structures than can be built for it.
• The spoken sentence, I made her duck, has five different meanings.
– (1.1) I cooked waterfowl for her.
– (1.2) I cooked waterfowl belonging to her.
– (1.3) I created the (plaster?) duck she owns.
– (1.4) I caused her to quickly lower her head or body.
– (1.5) I waved my magic wand and turned her into undifferentiated
waterfowl.
34. 34
1.2 Ambiguity
• These different meanings are caused by a number of ambiguities.
– Duck can be a verb or a noun, while her can be a dative pronoun or a
possessive pronoun.
– The word make can mean create or cook.
– Finally, the verb make is syntactically ambiguous in that it can be
transitive (1.2), or it can be ditransitive (1.5).
– Finally, make can take a direct object and a verb (1.4), meaning that the
object (her) got caused to perform the verbal action (duck).
– In a spoken sentence, there is an even deeper kind of ambiguity; the first
word could have been eye or the second word maid.
35. Why NL Understanding is hard?
• Natural language is extremely rich in form and structure, and
very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity can be at different
levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the meaning of
that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.
• Noisy input (e.g. speech)
36. 1.2 Ambiguity
• Ways to resolve or disambiguate these ambiguities:
– Deciding whether duck is a verb or a noun can be solved by part-
of-speech tagging .
– Deciding whether make means “create” or “cook” can be solved by
word sense disambiguation.
– Resolution of part-of-speech and word sense ambiguities are two
important kinds of lexical disambiguation.
• A wide variety of tasks can be framed as lexical disambiguation
problems.
– For example, a text-to-speech synthesis system reading the word
lead needs to decide whether it should be pronounced as in lead
pipe or as in lead me on.
• Deciding whether her and duck are part of the same entity (as in (1.1)
or (1.4)) or are different entity (as in (1.2)) is an example of syntactic
disambiguation and can be addressed by probabilistic parsing.
• Ambiguities that don’t arise in this particular example (like whether a
given sentence is a statement or a question) will also be resolved, for
example by speech act interpretation.
37. Why is Language Ambiguous?
• Having a unique linguistic expression for every
possible conceptualization that could be conveyed
would make language overly complex and linguistic
expressions unnecessarily long.
• Allowing resolvable ambiguity permits shorter
linguistic expressions, i.e. data compression.
• Language relies on people’s ability to use their
knowledge and inference abilities to properly resolve
ambiguities.
• Infrequently, disambiguation fails, i.e. the
compression is lossy.
38. Natural Languages vs. Computer Languages
• Ambiguity is the primary difference between natural
and computer languages.
• Formal programming languages are designed to be
unambiguous, i.e. they can be defined by a grammar
that produces a unique parse for each sentence in the
language.
• Programming languages are also designed for efficient
(deterministic) parsing, i.e. they are deterministic
context-free languages (DCFLs).
– A sentence in a DCFL can be parsed in O(n) time where n is
the length of the string.
40. 1.3 Models and Algorithms
• The most important model:
– state machines,
– formal rule systems,
– logic,
– probability theory and
– other machine learning tools
• The most important algorithms of these models:
– state space search algorithms and
– dynamic programming algorithms
41. 1.3 Models and Algorithms
• State machines are
– formal models that consist of states, transitions among
states, and an input representation.
• Some of the variations of this basic model:
– Deterministic and non-deterministic finite-state
automata,
– finite-state transducers, which can write to an output
device,
– weighted automata, Markov models, and hidden
Markov models, which have a probabilistic component.
42. 1.3 Models and Algorithms
• Closely related to the above procedural models are their declarative
counterparts: formal rule systems.
– regular grammars and regular relations, context-free grammars,
feature-augmented grammars, as well as probabilistic variants of them
all.
• State machines and formal rule systems are the main tools used when dealing
with knowledge of phonology, morphology, and syntax.
• The algorithms associated with both state-machines and formal rule systems
typically involve a search through a space of states representing hypotheses
about an input.
• Representative tasks include
– searching through a space of phonological sequences for a likely input
word in speech recognition, or
– searching through a space of trees for the correct syntactic parse of an
input sentence.
• Among the algorithms that are often used for these tasks are well-known graph
algorithms such as depth-first search, as well as heuristic variants such as
best-first, and A* search.
• The dynamic programming paradigm is critical to the computational
tractability of many of these approaches by ensuring that redundant
computations are avoided.
43. 1.3 Models and Algorithms
• The third model that plays a critical role in capturing
knowledge of language is logic.
• We will discuss
– first order logic, also known as the predicate calculus,
as well as
– such related formalisms as feature-structures,
– semantic networks, and
– conceptual dependency.
• These logical representations have traditionally been the
tool of choice when dealing with knowledge of semantics,
pragmatics, and discourse (although, as we will see,
applications in these areas are increasingly relying on the
simpler mechanisms used in phonology, morphology, and
syntax).
44. 1.3 Models and Algorithms
• Each of the other models (state machines, formal rule systems,
and logic) can be augmented with probabilities.
• One major use of probability theory is to solve the many
kinds of ambiguity problems that we discussed earlier;
– almost any speech and language processing problem can
be recast as: “given N choices for some ambiguous input,
choose the most probable one”.
• Another major advantage of probabilistic models is that
– they are one of a class of machine learning models.
• Machine learning research has focused on ways to
automatically learn the various representations described
above;
– automata, rule systems, search heuristics, classifiers.
• These systems can be trained on large corpora and can be
used as a powerful modeling technique, especially in places
where we don’t yet have good causal models.