AI411: NLP (Natural Language Processing )
Lecture 1: Introduction
Fall 2024
Dr. Ensaf Hussein
Associate Professor, Artificial Intelligence,
School of Information Technology and Computer Science,
Nile University.
In this Course we will learn to:
1. The foundations of the effective modern methods for deep learning
applied to NLP
▪ Basics first, then key methods used in NLP: Word vectors, feed-forward
networks, recurrent networks, attention, encoder-decoder models,
transformers, etc.
2. A big picture understanding of human languages and the difficulties in
understanding and producing them via computers
3. An understanding of and ability to build systems for some of the major
problems in NLP:
▪ Word meaning, machine translation, summarization, question answering
Prerequisites
▪ AIS301: Machine Learning
Tentative Schedule
1 Introduction to NLP
2 NLP Pipeline and Preprocessing
3 Word Embeddings
4 Backpropagation and Neural Networks
5 CFG - Dependency Parsing
6 Recurrent Neural Networks (RNNs) and Language Models
7 Named Entity Recognition (NER) and Part of Speech Tagging
8 Fancy RNNs, Seq2Seq
9 Machine Translation, Attention
10 Transformers
11 Pretraining Models - QA
12 Final Exam Review & Recap
Resources:
Grading policy
• Course Work:
• Lect Quizzes 10%
• Lab Quizzes: 10%
• Assignments: 10%
• Project : 20%
• Midterm: 20%
• Final Exam: 30%
• students with less than 30% in final exam will get an F in the course
• Students should attend 75% of lectures and labs to enter the final exam
Book Structure
Today’s Agenda
• What is NLP?
• History of NLP
• NLP Tasks
– NLU Applications
– NLG Applications
• What Is Language?
– Budling Blocks of language
– Why is Language challenging?
• ML, DL, and NLP: An Overview
• Approaches to NLP
– Heuristics-Based NLP
– Machine Learning for NLP
– Deep Learning for NLP
What is Natural Language Processing (NLP)?
• NLP deals with analyzing, understanding, and generating
human language through computational models.
• While humans communicate using natural language, machines
operate on structured data.
• NLP acts as the intermediary, converting unstructured human
language into structured data that machines can interpret.
History of NLP
2017
History of NLP
• NLP has been through (at least) 3 major eras:
▪1950s-1980s: Linguistics Methods and Handwritten Rules
▪1980s-2013: Corpus/Statistical Methods
▪2013s- Now: Deep Learning
• Lucky you! You’re right near the start of a paradigm shift!
1950s - 1980s: Linguistics/RuleSystems
• NLP systems focus on:
▪Linguistics: Grammar rules, sentence structure parsing, etc
▪Handwritten Rules: Huge sets of logical (if/else) statements
▪Ontologies: Manually created (domain-specific!) knowledge
bases to augment rules above
• Problems:
▪Too complex to maintain
▪Can’t scale!
▪Can’t generalize!
Eliza: 1966
• ELIZA is a simple pattern-based that uses pattern matching to recognize
phrases like “You are X” and translate them into suitable outputs like “What
makes you think I am X?”.
• ELIZA doesn’t actually need to know anything to mimic a Rogerian
psychotherapist.
• modern conversational agents are much more than a diversion; they can
answer questions, book flights, or find restaurants, functions for which they
rely on a much more sophisticated understanding of the user’s intent.
1980s – 2013s: Corpus/StatisticalMethods
• NLP starts using Machine Learning methods
• Use statistical learning over huge datasets of unstructured text
▪Corpus: Collection of text documents
▪e.g. Supervised Learning: Machine Translation
▪e.g. Unsupervised Learning: Deriving Word "Meanings"
(vectors)
2013s - Now DeepLearning
• Deep Learning made its name with Images first
• 2013: Deep Learning has major NLP breakthroughs
▪Researchers use a neural network to win the Large Scale
Visual Recognition Challenge (LSVRC)
▪This state of the art approach beat other ML approaches with
half their error
rate (26% vs 16%)
• Very useful for unified processing of Language + Images
The Role of Language in NLP
Language is a complex structure made up of different
components, such as phonemes, morphemes, syntax, and
context. Understanding these components is essential for
building NLP systems that can process and interpret human
language effectively.
Building Blocks of Language and their applications
Phonemes
• The smallest unit of sound in a
language.
• Though they carry no meaning by
themselves, they form the basis
for speech recognition systems.
• For instance, "p" in "pat" and "b"
in "bat" are distinct phonemes in
English.
Morphemes
• The smallest unit of meaning in a
language, often seen as prefixes,
suffixes, or roots.
• For example, in the word
"unbreakable," "un-", "break,"
and "-able" are all morphemes.
Syntax
• The set of rules that govern how
sentences are structured in a
language.
• It dictates how words are combined
to form grammatically correct
sentences.
• Parsing techniques in NLP rely on
syntax to understand sentence
structure.
Context
• Context is crucial in determining the meaning of a sentence,
especially when words have multiple meanings.
• For example, the word "bank" could refer to a financial
institution or the side of a river.
• Context helps resolve such ambiguities.
Levels of NLP
Morphemes Phonemes
Syntax
Semantics
Discourse
Text
Speech
Early Rules Engines
Corpus Methods
Modern Deep Learning is
about here!
NLP: Speech vs Text
Concept Space
Text Speech
• Natural Language can refer to Text or Speech
• Goal of both is the same: translate raw data (text or speech) into underlying
concepts (NLU) then possibly into the other form (NLG)
Text to Speech
Speech to Text
NLU
NLG
NLU
NLG
NLP Tasks and Application
NLU Vs. NLG Applications
• ML on Text (Classification, Regression,
Clustering)
• Document Recommendation
• Language Identification
• Natural LanguageSearch
• Sentiment Analysis
• Text Summarization
• Extracting Word/Document Meaning(vectors)
• Relationship Extraction
• Topic Modeling
• …andmore!
• Image Captioning
• (Better) TextSummarization
• Machine Translation
• Question Answering/Chatbots
• …so much more
• Notice NLU is almost a prerequisite for NLG
NLU Application
• Document Classification: “documents” - discrete collections of text - into
categories
▪Example: classify movie reviews as positive vs. negative
• Document Recommendation: Choosing the most relevant document based
on some information:
▪Example: show most relevant webpages based on query to search engine
• Topic Modeling: Breaking a set of documents into topics at the word level
▪Example: find documents belonging to a certain topic
NLG: Image Captioning
• Automatically generate captions for images
Captions automatically generated.
Source: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
NLG: Machine Translation
Example from Google®’s machine translation system (2016)
Source: https://ai.googleblog.com/2016/09/a-neural-network-for-
machine.html
• Automatically translate text between language
NLG: TextSummarization
• Automatically generate text summaries of documents
▪ Example: generate headlines of news articles
Source: https://ai.googleblog.com/2016/08/text-summarization-with-
tensorflow.html
30
NLG: Question Answering
31
NLG: Question Answering
• figure credit: Phani Marupaka 32
NLG: Dialog Systems
33
determiner verb (past) prep. proper proper poss. adj. noun
Some questioned if Tim Cook ’s first product
modal verb det. adjective noun prep. proper punc.
would be a breakaway hit for Apple .
Part-of-Speech-Tagging
determiner verb (past) prep. proper proper poss. adj. noun
modal verb det. adjective noun prep. proper punc.
34
determiner verb (past) prep. noun noun poss. adj. noun
Some questioned if Tim Cook ’s first product
modal verb det. adjective noun prep. noun punc.
would be a breakaway hit for Apple .
Part-of-Speech-Tagging
35
NP NP
Cook ’s first product may not be a breakaway hit
Syntactic Parsing
36
NP NP
VP
Cook ’s first product may not be a breakaway hit
Syntactic Parsing
37
NP NP
VP
Cook ’s first product may not be a breakaway hit
S
Syntactic Parsing
38
Some questioned if Tim Cook’s first product would be a breakaway hit for Apple.
PERSON ORGANIZATION
Named Entity Recognition
39
Some questioned if Tim Cook’s first product would be a breakaway hit for Apple.
Entity Linking
40
Some questioned if Tim Cook’s first product would be a
breakaway hit for Apple.
It’s the company’s first new device since he became CEO.
Coreference Resolution
41
Some questioned if Tim Cook’s first product
would be a breakaway hit for Apple.
It’s the company’s first new device since he
became CEO.
Coreference Resolution
42
Some questioned if Tim Cook’s first product
would be a breakaway hit for Apple.
It’s the company’s first new device since he
became CEO.
Coreference Resolution
43
Some questioned if Tim Cook’s first product
would be a breakaway hit for Apple.
It’s the company’s first new device since he
became CEO.
??
Coreference Resolution
• Once there was a boy named Fritz who loved to draw. He drew everything.
In the morning, he drew a picture of his cereal with milk. His papa said,
“Don’t draw your cereal. Eat it!”
• After school, Fritz drew a picture of his bicycle. His uncle said, “Don't draw
your bicycle. Ride it!”
• …
• What did Fritz draw first?
• A) the toothpaste
• B) his mama
• C) cereal and milk
• D) his bicycle
44
Reading Comprehension
45
Reading Comprehension
46
Other ways are needed.
We must find other ways.
I absolutely do believe there was an iceberg in those waters.
I don't believe there was any iceberg at all anywhere near the Titanic.
4.4
1.2
Input Output
Pakistan bomb victims’ families end protest
Pakistan bomb victims to be buried after protest ends
2.6
Sentence Similarity
47
he bent down and searched the large container, trying to find
anything else hidden in it other than the _____
Word Prediction
48
he turned to one of the cops beside him. “search the entire
coffin.” the man nodded and bustled forward towards the coffin.
he bent down and searched the large container, trying to find
anything else hidden in it other than the _____
Word Prediction
Language Models
• A model like the one in the previous examples is called
language model.
• A language model is trained on a huge amount of texts, trying
to learn the distribution of the words in the language.
• This can be done by training the model to predict the words
that follow a sentence, or trying to recover masked words from
their surrounding context.
GPT3
• This model is a transformer neural network consisting of 175
billions of parameters and requiring approximately 800GB of
storage.
• The training data consists of hundreds of billions of words from
texts from around the Internet. It has been trained by OpenAI,
which hosts it and make it available as a service.
NLP Tasks organized according to their difficulty
Challenges in NLP
Despite NLP’s success, the field is fraught with several challenges due
to the inherent complexity of human language:
1. Ambiguity
Human language is inherently ambiguous. A single sentence can have
multiple meanings depending on the context.
For example, “I made her duck” could mean either preparing a meal or causing
someone to crouch.
2. Common Knowledge
Humans use common knowledge—unstated facts and assumptions—
in conversations. Machines, however, struggle to understand this
unless explicitly programmed.
Examples on ambiguity in Language
Challenges in NLP
3. Creativity
Language often includes metaphors, idioms, and sarcasm,
making it difficult for machines to interpret literal versus
figurative meanings.
4. Diversity of Languages
The grammatical and syntactical rules of languages vary greatly,
making it challenging to create universal NLP systems.
Machine Learning, Deep Learning, and NLP: An Overview
• NLP has evolved from heuristic, rule-based systems to approaches
leveraging machine learning (ML) and deep learning (DL).
• Early approaches focused on hand-crafted rules, but ML models,
such as Naive Bayes and SVMs, now dominate.
•
• Deep learning, especially with architectures like RNNs, LSTMs,
CNNs, and Transformers (e.g., BERT), has significantly enhanced
NLP capabilities.
How NLP, ML, and DL are related
Approaches to NLP
There are three main approaches:
• Heuristics-Based NLP: Early systems based on rules and domain-specific
knowledge (e.g., lexicons, regex).
• Machine Learning for NLP: Supervised and unsupervised learning
techniques, such as Naive Bayes, SVMs, and HMMs, have been applied
to NLP tasks.
• Deep Learning for NLP: Advanced models like LSTMs and Transformers
dominate the field but still face challenges like overfitting, domain
adaptation, and interpretability.
Wrapping Up …
• What is NLP?
• History of NLP
• NLP Tasks
– NLU Applications
– NLG Applications
• What Is Language?
– Budling Blocks of language
– Why is Language challenging?
• ML, DL, and NLP: An Overview
• Approaches to NLP
– Heuristics-Based NLP
– Machine Learning for NLP
– Deep Learning for NLP

A411 NLP Lect 1 natural language processing

  • 1.
    AI411: NLP (NaturalLanguage Processing ) Lecture 1: Introduction Fall 2024 Dr. Ensaf Hussein Associate Professor, Artificial Intelligence, School of Information Technology and Computer Science, Nile University.
  • 2.
    In this Coursewe will learn to: 1. The foundations of the effective modern methods for deep learning applied to NLP ▪ Basics first, then key methods used in NLP: Word vectors, feed-forward networks, recurrent networks, attention, encoder-decoder models, transformers, etc. 2. A big picture understanding of human languages and the difficulties in understanding and producing them via computers 3. An understanding of and ability to build systems for some of the major problems in NLP: ▪ Word meaning, machine translation, summarization, question answering
  • 3.
  • 4.
    Tentative Schedule 1 Introductionto NLP 2 NLP Pipeline and Preprocessing 3 Word Embeddings 4 Backpropagation and Neural Networks 5 CFG - Dependency Parsing 6 Recurrent Neural Networks (RNNs) and Language Models 7 Named Entity Recognition (NER) and Part of Speech Tagging 8 Fancy RNNs, Seq2Seq 9 Machine Translation, Attention 10 Transformers 11 Pretraining Models - QA 12 Final Exam Review & Recap
  • 5.
  • 6.
    Grading policy • CourseWork: • Lect Quizzes 10% • Lab Quizzes: 10% • Assignments: 10% • Project : 20% • Midterm: 20% • Final Exam: 30% • students with less than 30% in final exam will get an F in the course • Students should attend 75% of lectures and labs to enter the final exam
  • 7.
  • 8.
    Today’s Agenda • Whatis NLP? • History of NLP • NLP Tasks – NLU Applications – NLG Applications • What Is Language? – Budling Blocks of language – Why is Language challenging? • ML, DL, and NLP: An Overview • Approaches to NLP – Heuristics-Based NLP – Machine Learning for NLP – Deep Learning for NLP
  • 9.
    What is NaturalLanguage Processing (NLP)? • NLP deals with analyzing, understanding, and generating human language through computational models. • While humans communicate using natural language, machines operate on structured data. • NLP acts as the intermediary, converting unstructured human language into structured data that machines can interpret.
  • 10.
  • 11.
    History of NLP •NLP has been through (at least) 3 major eras: ▪1950s-1980s: Linguistics Methods and Handwritten Rules ▪1980s-2013: Corpus/Statistical Methods ▪2013s- Now: Deep Learning • Lucky you! You’re right near the start of a paradigm shift!
  • 12.
    1950s - 1980s:Linguistics/RuleSystems • NLP systems focus on: ▪Linguistics: Grammar rules, sentence structure parsing, etc ▪Handwritten Rules: Huge sets of logical (if/else) statements ▪Ontologies: Manually created (domain-specific!) knowledge bases to augment rules above • Problems: ▪Too complex to maintain ▪Can’t scale! ▪Can’t generalize!
  • 13.
    Eliza: 1966 • ELIZAis a simple pattern-based that uses pattern matching to recognize phrases like “You are X” and translate them into suitable outputs like “What makes you think I am X?”. • ELIZA doesn’t actually need to know anything to mimic a Rogerian psychotherapist. • modern conversational agents are much more than a diversion; they can answer questions, book flights, or find restaurants, functions for which they rely on a much more sophisticated understanding of the user’s intent.
  • 14.
    1980s – 2013s:Corpus/StatisticalMethods • NLP starts using Machine Learning methods • Use statistical learning over huge datasets of unstructured text ▪Corpus: Collection of text documents ▪e.g. Supervised Learning: Machine Translation ▪e.g. Unsupervised Learning: Deriving Word "Meanings" (vectors)
  • 15.
    2013s - NowDeepLearning • Deep Learning made its name with Images first • 2013: Deep Learning has major NLP breakthroughs ▪Researchers use a neural network to win the Large Scale Visual Recognition Challenge (LSVRC) ▪This state of the art approach beat other ML approaches with half their error rate (26% vs 16%) • Very useful for unified processing of Language + Images
  • 16.
    The Role ofLanguage in NLP Language is a complex structure made up of different components, such as phonemes, morphemes, syntax, and context. Understanding these components is essential for building NLP systems that can process and interpret human language effectively.
  • 17.
    Building Blocks ofLanguage and their applications
  • 18.
    Phonemes • The smallestunit of sound in a language. • Though they carry no meaning by themselves, they form the basis for speech recognition systems. • For instance, "p" in "pat" and "b" in "bat" are distinct phonemes in English.
  • 19.
    Morphemes • The smallestunit of meaning in a language, often seen as prefixes, suffixes, or roots. • For example, in the word "unbreakable," "un-", "break," and "-able" are all morphemes.
  • 20.
    Syntax • The setof rules that govern how sentences are structured in a language. • It dictates how words are combined to form grammatically correct sentences. • Parsing techniques in NLP rely on syntax to understand sentence structure.
  • 21.
    Context • Context iscrucial in determining the meaning of a sentence, especially when words have multiple meanings. • For example, the word "bank" could refer to a financial institution or the side of a river. • Context helps resolve such ambiguities.
  • 22.
    Levels of NLP MorphemesPhonemes Syntax Semantics Discourse Text Speech Early Rules Engines Corpus Methods Modern Deep Learning is about here!
  • 23.
    NLP: Speech vsText Concept Space Text Speech • Natural Language can refer to Text or Speech • Goal of both is the same: translate raw data (text or speech) into underlying concepts (NLU) then possibly into the other form (NLG) Text to Speech Speech to Text NLU NLG NLU NLG
  • 24.
    NLP Tasks andApplication
  • 25.
    NLU Vs. NLGApplications • ML on Text (Classification, Regression, Clustering) • Document Recommendation • Language Identification • Natural LanguageSearch • Sentiment Analysis • Text Summarization • Extracting Word/Document Meaning(vectors) • Relationship Extraction • Topic Modeling • …andmore! • Image Captioning • (Better) TextSummarization • Machine Translation • Question Answering/Chatbots • …so much more • Notice NLU is almost a prerequisite for NLG
  • 26.
    NLU Application • DocumentClassification: “documents” - discrete collections of text - into categories ▪Example: classify movie reviews as positive vs. negative • Document Recommendation: Choosing the most relevant document based on some information: ▪Example: show most relevant webpages based on query to search engine • Topic Modeling: Breaking a set of documents into topics at the word level ▪Example: find documents belonging to a certain topic
  • 27.
    NLG: Image Captioning •Automatically generate captions for images Captions automatically generated. Source: https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
  • 28.
    NLG: Machine Translation Examplefrom Google®’s machine translation system (2016) Source: https://ai.googleblog.com/2016/09/a-neural-network-for- machine.html • Automatically translate text between language
  • 29.
    NLG: TextSummarization • Automaticallygenerate text summaries of documents ▪ Example: generate headlines of news articles Source: https://ai.googleblog.com/2016/08/text-summarization-with- tensorflow.html
  • 30.
  • 31.
  • 32.
    • figure credit:Phani Marupaka 32 NLG: Dialog Systems
  • 33.
    33 determiner verb (past)prep. proper proper poss. adj. noun Some questioned if Tim Cook ’s first product modal verb det. adjective noun prep. proper punc. would be a breakaway hit for Apple . Part-of-Speech-Tagging
  • 34.
    determiner verb (past)prep. proper proper poss. adj. noun modal verb det. adjective noun prep. proper punc. 34 determiner verb (past) prep. noun noun poss. adj. noun Some questioned if Tim Cook ’s first product modal verb det. adjective noun prep. noun punc. would be a breakaway hit for Apple . Part-of-Speech-Tagging
  • 35.
    35 NP NP Cook ’sfirst product may not be a breakaway hit Syntactic Parsing
  • 36.
    36 NP NP VP Cook ’sfirst product may not be a breakaway hit Syntactic Parsing
  • 37.
    37 NP NP VP Cook ’sfirst product may not be a breakaway hit S Syntactic Parsing
  • 38.
    38 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. PERSON ORGANIZATION Named Entity Recognition
  • 39.
    39 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. Entity Linking
  • 40.
    40 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. It’s the company’s first new device since he became CEO. Coreference Resolution
  • 41.
    41 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. It’s the company’s first new device since he became CEO. Coreference Resolution
  • 42.
    42 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. It’s the company’s first new device since he became CEO. Coreference Resolution
  • 43.
    43 Some questioned ifTim Cook’s first product would be a breakaway hit for Apple. It’s the company’s first new device since he became CEO. ?? Coreference Resolution
  • 44.
    • Once therewas a boy named Fritz who loved to draw. He drew everything. In the morning, he drew a picture of his cereal with milk. His papa said, “Don’t draw your cereal. Eat it!” • After school, Fritz drew a picture of his bicycle. His uncle said, “Don't draw your bicycle. Ride it!” • … • What did Fritz draw first? • A) the toothpaste • B) his mama • C) cereal and milk • D) his bicycle 44 Reading Comprehension
  • 45.
  • 46.
    46 Other ways areneeded. We must find other ways. I absolutely do believe there was an iceberg in those waters. I don't believe there was any iceberg at all anywhere near the Titanic. 4.4 1.2 Input Output Pakistan bomb victims’ families end protest Pakistan bomb victims to be buried after protest ends 2.6 Sentence Similarity
  • 47.
    47 he bent downand searched the large container, trying to find anything else hidden in it other than the _____ Word Prediction
  • 48.
    48 he turned toone of the cops beside him. “search the entire coffin.” the man nodded and bustled forward towards the coffin. he bent down and searched the large container, trying to find anything else hidden in it other than the _____ Word Prediction
  • 49.
    Language Models • Amodel like the one in the previous examples is called language model. • A language model is trained on a huge amount of texts, trying to learn the distribution of the words in the language. • This can be done by training the model to predict the words that follow a sentence, or trying to recover masked words from their surrounding context.
  • 50.
    GPT3 • This modelis a transformer neural network consisting of 175 billions of parameters and requiring approximately 800GB of storage. • The training data consists of hundreds of billions of words from texts from around the Internet. It has been trained by OpenAI, which hosts it and make it available as a service.
  • 51.
    NLP Tasks organizedaccording to their difficulty
  • 52.
    Challenges in NLP DespiteNLP’s success, the field is fraught with several challenges due to the inherent complexity of human language: 1. Ambiguity Human language is inherently ambiguous. A single sentence can have multiple meanings depending on the context. For example, “I made her duck” could mean either preparing a meal or causing someone to crouch. 2. Common Knowledge Humans use common knowledge—unstated facts and assumptions— in conversations. Machines, however, struggle to understand this unless explicitly programmed.
  • 53.
  • 54.
    Challenges in NLP 3.Creativity Language often includes metaphors, idioms, and sarcasm, making it difficult for machines to interpret literal versus figurative meanings. 4. Diversity of Languages The grammatical and syntactical rules of languages vary greatly, making it challenging to create universal NLP systems.
  • 55.
    Machine Learning, DeepLearning, and NLP: An Overview • NLP has evolved from heuristic, rule-based systems to approaches leveraging machine learning (ML) and deep learning (DL). • Early approaches focused on hand-crafted rules, but ML models, such as Naive Bayes and SVMs, now dominate. • • Deep learning, especially with architectures like RNNs, LSTMs, CNNs, and Transformers (e.g., BERT), has significantly enhanced NLP capabilities.
  • 56.
    How NLP, ML,and DL are related
  • 57.
    Approaches to NLP Thereare three main approaches: • Heuristics-Based NLP: Early systems based on rules and domain-specific knowledge (e.g., lexicons, regex). • Machine Learning for NLP: Supervised and unsupervised learning techniques, such as Naive Bayes, SVMs, and HMMs, have been applied to NLP tasks. • Deep Learning for NLP: Advanced models like LSTMs and Transformers dominate the field but still face challenges like overfitting, domain adaptation, and interpretability.
  • 58.
    Wrapping Up … •What is NLP? • History of NLP • NLP Tasks – NLU Applications – NLG Applications • What Is Language? – Budling Blocks of language – Why is Language challenging? • ML, DL, and NLP: An Overview • Approaches to NLP – Heuristics-Based NLP – Machine Learning for NLP – Deep Learning for NLP