SlideShare a Scribd company logo
1 of 86
Download to read offline
HAL 9000 – (3 min)
haithem.afli@cit.ie 1
https://www.youtube.com/watch?v=ARJ8cAGm6JE
http://www.cit.ie
Computer Science Department
Haithem. afli@cit.ie
@AfliHaithem
Natural Language Processing and its
applications
Dr Haithem Afli
May 7th , 2020
Online Lecture
Dr Haithem Afli - Background
§ Computer Science Lecturer at CIT (J102)
- NLP, Data Analytics and ML
§ Science Foundation Ireland Funded Investigator
- Leader of ADAPT@CIT research group
Research Interest:
- Natural Language Processing
- Social Media and UGC Analysis
- Machine Translation
- Data Analytics
§ Lecturing Experience (10+ years)
- 5 years in France
- 3 years in DCU
- 2 years in CIT
05/05/2020 3
Overview
§ Introduction to NLP
§ Language Modeling and its applications
§ The Golden Age of AI
§ Language Technologies
§ Ethical Issues (the case of Dialog systems)
haithem.afli@cit.ie 4
Language
§ “Language, a system of conventional spoken, manual
(signed), or written symbols by means of
which human beings, as members of a social group and
participants in its culture, express themselves.
§ The functions of language include communication, the
expression of identity, play, imaginative expression,
and emotional release. “ Britanica
haithem.afli@cit.ie 5
If you think the language
industry is new
haithem.afli@cit.ie 6
If you think the language
industry is new, think again!
haithem.afli@cit.ie 7
Rosetta Stone (British Museum)
Natural Language :
An age-old industry ?
§ For as far back as we can see, human has needed to
communicate → so the origin of language industry is closely
intertwined with the need of communication itself
haithem.afli@cit.ie 8
The Tower of Babel and The House of Wisdom in Bagdad (Bait-al-Hikma)
The importance of Language
Processing in modern history
haithem.afli@cit.ie 9
Media agencies and translators interpreted the word “treat with silent contempt” or “take
into account” (to ignore), as the categorical rejection by the Prime Minister.
The Americans understood that there would never be a diplomatic end to the war and
were naturally annoyed by what they considered the arrogant tone used in the Japanese
translation of the Prime Minister’s response. International news agencies reported to the
world that in the eyes of the Japanese government the ultimatum was “not worthy of
comment.”
haithem.afli@cit.ie 10
Machine Translation
http://sma.adaptcentre.ie/ge16/#!/
Social Media Analysis
Haithem Afli, Sorcha McGuire, and Andy Way. 2017. Sentiment translation for low resourced languages: Experiments on irish general election tweets. In 18th
International Conference on Computational Linguistics and Intelligent Text Processing.
haithem.afli@cit.ie 11
Information Extraction & Sentiment Analysis
§ nice and compact to carry!
§ since the camera is small and light, I won't need to carry
around those heavy, bulky professional cameras either!
§ the camera feels flimsy, is plastic and very light in weight
you have to be very delicate in the handling of this
camera
Size and weight
Attributes:
zoom
affordability
size and weight
flash
ease of use
✓
✗
✓
haithem.afli@cit.ie 12
Sentiment Analysis (Aspect-based)
haithem.afli@cit.ie 13
Requested
translation
from
Twitter
(words)
Grand Total from
all World Cup
matches
6,459,830 5,141,360 4,847,590 85,047,110
• SourceàTarget traffic:
• ENàES 13,614,450 (EN to all languages: 50,545,460)
• ESàEN 5,569,200 (ES to all languages: 10,609,420)
• PTàEN 1,831,750 (PT to all languages: 4,230,880)
The 2014 FIFA World Cup was the biggest event yet for Twitter with 672 million tweets
English Portuguese SpanishTop 3 languages
UGC Machine Translation - Braziliator
haithem.afli@cit.ie 14
Requested
translation
from
Twitter
(words)
Grand Total from
all World Cup
matches
6,459,830 5,141,360 4,847,590 85,047,110
• SourceàTarget traffic:
• ENàES 13,614,450 (EN to all languages: 50,545,460)
• ESàEN 5,569,200 (ES to all languages: 10,609,420)
• PTàEN 1,831,750 (PT to all languages: 4,230,880)
The 2014 FIFA World Cup was the biggest event yet for Twitter with 672 million tweets
English Portuguese SpanishTop 3 languages
UGC Machine Translation - Braziliator
haithem.afli@cit.ie 15
UI: Sentiment pitch
Final: Germany 1-0 Argentina
3rd
Place: Netherlands 3-0 Brazil
Semi-final: Argentina 1-0 NetherlandsSemi-final: Germany 7-1 Brazil
UGC Machine Translation - Braziliator
16
UGC Machine Translation - Braziliator
17
Semi-final: Germany 7-1 Brazil
Now if we return to HAL 9000
haithem.afli@cit.ie 18
https://www.youtube.com/watch?v=ARJ8cAGm6JE
HAL: What’s needed?
§ Speech recognition and synthesis
§ Knowledge of the English words involved
§ What they mean?
§ How groups of words form a sentence
§ How can we define a language?
haithem.afli@cit.ie 19
Deterministic Definition
haithem.afli@cit.ie 20
Chomsky Hierarchy
haithem.afli@cit.ie 21
What is a language?
Can we define a language mathematically?
Deterministic Definition:
A language is the set of all the sentences we can
say.
Probabilistic Definition:
A language is the probabilistic distribution of all
possible sentences
Ø Statistical Language Model
haithem.afli@cit.ie 22
Statistical Language Model
Defined as
The normalization condition
haithem.afli@cit.ie 23
Statistical Language Model
• How can we estimate the probability of a sentence in a
specific language?
• Unlike estimating the probability distribution of a dice, we
cannot exhaust all the possible sentences in
limited samples
haithem.afli@cit.ie 24
Statistical Language Model
• How can we estimate the probability of a sentence in a
specific language?
• Unlike estimating the probability distribution of a dice, we
cannot exhaust all the possible sentences in
limited sample
• Idea
- break all sentences down to limited substrings (n-grams)
- Estimate the probability of a sentence by these substrings
l If a sentence has many plausible substrings then it
might be a reasonable sentence
haithem.afli@cit.ie 25
Simplest Language Model
• Simplest way to break down a sentence
- split it to words
• Thus, the simplest language model
• Here the probability of a sentence is just the
multiplication of the probability of the words in the
sentence
• This model is called unigram language model
haithem.afli@cit.ie 26
Word Frequency
• p(w) is word frequency
Type Occurrences Rank
the 3789654 1st
he 2098762 2nd
[...]
king 57897 1,356th
boy 56975 1,357th
[...]
stringyfy 5 34,589th
[...]
transducionalify 1 123,567th
p(w)=
occurrences of w
number of tokens
haithem.afli@cit.ie 27
Unigram Language Model
haithem.afli@cit.ie 28
Statistical Machine Translation
haithem.afli@cit.ie 29
Andy Way
1990s-2010s: Statistical Machine Translation
• Question: How to learn translation model ?
• First, need large amount of parallel data
(e.g. pairs of human-translated French/English sentences)
haithem.afli@cit.ie
1990s-2010s: Statistical Machine Translation
• Question: How to learn translation model ?
• First, need large amount of parallel data
(e.g. pairs of human-translated French/English sentences)
• Break it down further: we actually want to consider
where a is the alignment, i.e. word-level correspondence
between French sentence x and English sentence y
haithem.afli@cit.ie
How does SMT Work?
haithem.afli@cit.ie 32
Fei-Fei Li
Neural MT
§ Paradigm Shift: Machine
Learning Ă  Deep Learning
§ Hardware: CPU à GPU
§ Open-Source Tools:
§ Theano
§ Tensorflow
§ Pytorch
§ mxnet
§ Slower Training &
Decoding
haithem.afli@cit.ie 33
HAL: What’s needed?
§ Dialog
§ It is polite to respond, even if you re planning to kill someone.
§ It is polite to pretend to want to be cooperative (I m afraid, I
can t…)
haithem.afli@cit.ie 34
Conversational Agents
AKA Dialog Agents
§ Phone-based Personal Assistants
SIRI, Alexa, Cortana, Google Assistant
§ Talking to your car
§ Communicating with robots
§ Clinical uses for mental health
§ Chatting for fun
haithem.afli@cit.ie 35
Two classes of systems
1. Chatbots
2. (Goal-based) Dialog agents
- SIRI, interfaces to cars, robots,
- booking flights or restaurants
The word "chatbots" is sometimes used in the popular
press for both. We'll use it only for #1.
haithem.afli@cit.ie 36
haithem.afli@cit.ie 37
But before Weobot ..
haithem.afli@cit.ie 38
Eliza: Weizenbaum (1966)
Men are all alike.
IN WHAT WAY
They're always bugging us about something or other.
CAN YOU THINK OF A SPECIFIC EXAMPLE
Well, my boyfriend made me come here.
YOUR BOYFRIEND MADE YOU COME HERE
He says I'm depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
...
WHO ELSE IN YOUR FAMILY TAKES CARE 0F YOU?
My father
YOUR FATHER
haithem.afli@cit.ie 39
You are like my father in some ways.
WHAT RESEMBLANCE DO YOU SEE
You are not very aggressive but I think you don’t want me to notice that.
WHAT MAKES YOU THINK I AM NOT AGGRESSIVE
You don’t argue with me.
WHY DO YOU THINK I DON’T ARGUE WITH YOU
You are afraid of me.
DOES IT PLEASE YOU TO BELIEVE I’M AFRAID OF YOU
My father is afraid of everybody.
WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER
Bullies.
DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU
COME HERE
How does it work?
Eliza uses this one weird trick
haithem.afli@cit.ie 40
That trick: be a Rogerian psychologist
haithem.afli@cit.ie 41
That trick: be a Rogerian psychologist
§ Draw the patient out by reflecting patient’s
statements back at them
§ Rare type of conversation in which one can
“assume the pose of knowing almost
nothing of the real world”
haithem.afli@cit.ie 42
Eliza pattern/transform rules
(0 YOU 0 ME) [pattern]
Ă 
(WHAT MAKES YOU THINK I 3 YOU) [transform]
0 means Kleene star *
The 3 is the constituent # in pattern
You hate me
WHAT MAKES YOU THINK I HATE YOU
haithem.afli@cit.ie 43
Dan Jurafsky
Some implications
§ People became deeply emotionally involved with the
program
§ Weizenbaum tells the story of his secretary who would ask
Weizenbaum to leave the room when she talked with ELIZA
§ When he suggested that he might want to store all the ELIZA
conversations for later analysis, people immediately pointed
out the privacy implications
§ Suggesting that they were having quite private conversations with
ELIZA
§ Anthropomorphicism and the Heider-Simmel Illusion
§ https://www.youtube.com/watch?v=8FIEZXMUM2I
haithem.afli@cit.ie 44
Components of current SIRI-style architectures
Interaction Model
Speech
Synthesis
Output
to User
Speech
Synthesis
Elicitation
Interaction Context
World Knowledge
Word
Sequence
Input
from User
Speech
Recognition
Semantic
Interpretation
LPM
Training
NL Under-
standing
Clarifying
Question
Dialog
Management
Missing
Elements
incomplete
Best
Outcome
Inferred
User Intent
Action
Selection
complete
?
Figure from Jerome Bellegarda
haithem.afli@cit.ie 45
NLP in the Golden Age of AI
NLP has an AI aspect to it.
§ We’re often dealing with ill-defined problems
§ We don’t often come up with exact solutions/algorithms
§ We can’t let either of those facts get in the way of making progress
haithem.afli@cit.ie 46
Artificial intelligence (AI)
Beyond the Hype
haithem.afli@cit.ie 47
Graph from Tobias Bohnhoff
https://nativevideotube.blogspot.com/
haithem.afli@cit.ie 48
NLP - the language industry
The Rise of Natural Language Processing
(NLP), and How it is Changing the Way we
Retrieve Information
haithem.afli@cit.ie 49
The 'creator' of Bitcoin, Satoshi Nakamoto, is
the world's most elusive billionaire. Very few
people outside of the Department of
Homeland Security know Satoshi's real
name. Satoshi has taken great care to keep
his identity secret employing the latest
encryption and obfuscation methods in his
communications.
Despite these efforts Satoshi Nakamoto gave
investigators the only tool they needed to find him -
- his own words. Using NLP, NSA (and everyone!)
was able to compare texts to determine authorship
of a particular work.
More info: https://tech.slashdot.org/story/17/08/28/1725232/how-the-nsa-identified-satoshi-
nakamoto
Timeline of (modern) AI
haithem.afli@cit.ie
Graph from The University Of Queensland Brain Institute
The 1st AI
Winter
The second AI
Winter
Including CIT MSc in AI
https://www.cit.ie/course/CRKARIN9
50
The first AI winter
haithem.afli@cit.ie
By 1964, the National Research Council (NRC)
had become concerned about the lack of progress
and formed the Automatic Language Processing
Advisory Committee (ALPAC) to look into the
problem.
They concluded, in a famous 1966 report, that
machine translation was more expensive, less
accurate and slower than human translation.
After spending some 20 million dollars, the NRC
ended all support.
Image from Wikipedia
51
haithem.afli@cit.ie
In 1984, John McCarthy criticized expert systems because they lacked common sense
and knowledge about their own limitations.
Schwarz, Director of DARPA ISTO from 1987 to 1989 concluded that AI research has
always had
“… very limited success in particular areas, followed immediately by failure to reach the
broader goal at which these initial successes seem at first to hint…”.
Ø Decrease in funding in AI research.
Ø Many AI companies closed their doors.
Ø The AAAI conference that attracted over 6000
visitors in 1986 quickly decreased to just 2000
by 1991.
The second AI winter
52
The survivors
The Deep Learning God Fathers
haithem.afli@cit.ie
Turing Award given for:
• “The conceptual and engineering breakthroughs that have made deep neural
networks a critical component of computing.”
53
Deep Learning Era
haithem.afli@cit.ie 54
2014: Generative Adversarial
Networks
§ The neural network at
the top is the
discriminator, and its task
is to distinguish the
training set’s real
information from the
generator’s creations.
§ In the simplest GAN
structure, the generator
starts with random data
and learns to transform
this noise into
information that matches
the distribution of the
real data.
haithem.afli@cit.ie 55
Do you know this person?
haithem.afli@cit.ie
https://thispersondoesnotexist.com/
56
haithem.afli@cit.ie 57
2018: StyleGAN
haithem.afli@cit.ie 58
Failure Cases
haithem.afli@cit.ie 59
CycleGAN (Zhu et al., 2017)
DeepFake
§ The development of
deepfakes has taken place
to a large extent in two
settings: research at
academic institutions, and
development by amateurs
in online communities.
haithem.afli@cit.ie 60
GAN
Applications of GANs
ØGANs for Image Editing
ØUsing GANs for Security
(SSGAN: Secure Steganography Based on GAN)
ØDe-aging Robert De Niro!
(Martin Scorsese spent millions of Netflix's money
to digitally de-age De Niro, Pacino, and Pesci so they could portray these men throughout
different parts of their lives.)
haithem.afli@cit.ie 61
Attending the Unattainable
haithem.afli@cit.ie 62
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
Why else is natural language
understanding difficult?
haithem.afli@cit.ie 63
Challenges in translating User-Generated
Content
haithem.afli@cit.ie 64
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
segmentation issues
the New York-New Haven Railroad
the New York-New Haven Railroad
Why else is natural language
understanding difficult?
haithem.afli@cit.ie 65
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
segmentation issues idioms
dark horse
get cold feet
lose face
throw in the towel
the New York-New Haven Railroad
the New York-New Haven Railroad
Why else is natural language
understanding difficult?
haithem.afli@cit.ie 66
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
segmentation issues idioms
dark horse
get cold feet
lose face
throw in the towel
neologisms
unfriend
Retweet
bromance
the New York-New Haven Railroad
the New York-New Haven Railroad
Why else is natural language
understanding difficult?
haithem.afli@cit.ie 67
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & you yourself
should never give up either♥
segmentation issues idioms
dark horse
get cold feet
lose face
throw in the towel
neologisms
unfriend
Retweet
bromance
But that’s what makes it fun!
the New York-New Haven Railroad
the New York-New Haven Railroad
Why else is natural language
understanding difficult?
haithem.afli@cit.ie 68
Making progress on this problem…
§ The task is difficult! What tools do we need?
§ Knowledge about language
§ Knowledge about the world
§ A way to combine knowledge sources
§ How we generally do this:
§ Probabilistic models built from language data
§ P(“maison” ® “house”) high
§ P(“L’avocat général” ® “the general avocado”) low
§ Luckily, rough text features can often do half the job.
haithem.afli@cit.ie 69
Dan Jurafsky and James H. Martin
Ø Pre-trained models
Pre-trained models: BERT
haithem.afli@cit.ie
BERT makes use of Transformer, an
attention mechanism that learns
contextual relations between words (or
sub-words) in a text.
70
From BERT to ALBERT
haithem.afli@cit.ie 71
• BERT (Google)
• XLNet (Google/CMU)
• RoBERTa (Facebook)
• DistilBERT (HuggingFace)
• CTRL (Salesforce)
• GPT-2 (OpenAI)
• Megatron (NVIDIA)
• ALBERT (Google)
2019: OpenAI GPT2
haithem.afli@cit.ie 72
haithem.afli@cit.ie
OpenAI GPT2
73
OpenAI GPT2
haithem.afli@cit.ie 74
Challenges with automatically
generated texts
haithem.afli@cit.ie 75
Addressing commensense problem
haithem.afli@cit.ie 76
Cunxiang Wang, Shuailong Liang , Yue Zhang , Xiaonan Li and Tian Gao. Does It Make Sense?
And Why? A Pilot Study for Sense Making and Explanation.
Language Technology
Coreference resolution
Question answering (QA)
Part-of-speech (POS) tagging
Word sense disambiguation
(WSD)
Paraphrase
Named entity recognition (NER)
Parsing
Summarization
Information extraction (IE)
Machine translation (MT)
Dialog
Sentiment analysis
mostly solved
making good progress
still really hard
Spam detection
Let’s go to Agra!
Buy V1AGRA …
✓
✗
Colorless green ideas sleep furiously.
ADJ ADJ NOUN VERB ADV
Einstein met with UN officials in Princeton
PERSON ORG LOC
You’re invited to our dinner
party, Friday May 27 at 8:30
Party
May 27
add
Best roast chicken in San Francisco!
The waiter ignored us for 20 minutes.
Carter told Mubarak he shouldn’t run again.
I need new batteries for my mouse.
The 13th Shanghai International Film Festival…
13 …
The Dow Jones is up
Housing prices rose
Economy is
good
Q. How effective is ibuprofen in reducing
fever in patients with acute febrile illness?
I can see Alcatraz from the window!
XYZ acquired ABC yesterday
ABC has been taken over by XYZ
Where is Citizen Kane playing in SF?
Castro Theatre at 7:30. Do
you want a ticket?
The S&P500 jumped
haithem.afli@cit.ie 77
Real Success: IBM’s Watson
§ Won Jeopardy on February 16, 2011!
WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Bram Stoker
haithem.afli@cit.ie 78
Real Success: Watson on Jeopardy
§ https://www.youtube.com/watch?v=WFR3lOm_xhE
haithem.afli@cit.ie 79
Ethical Issues in Dialog System Design
§ Machine learning systems replicate biases that occurred in
the training data.
§ Microsoft's Tay chatbot
§ Went live on Twitter in 2016
§ Taken offline 16 hours later
§ In that time it had started posting racial slurs, conspiracy
theories, and personal attacks
§ Learned from user interactions (Neff and Nagy 2016)
The Twitter profile picture of Tay
haithem.afli@cit.ie 80
Fails ..
haithem.afli@cit.ie 81
Ethical Issues in Dialog System Design
§ Machine learning systems replicate biases that occurred in
the training data.
§ Dialog datasets
§ Henderson et al. (2017) examined standard datasets (Twitter, Reddit,
movie dialogs)
§ Found examples of hate speech, offensive language, and bias
§ Both in the original training data, and in the output of chatbots trained
on the data.
haithem.afli@cit.ie 82
Ethical Issues in Dialog System Design: Privacy
§ Remember this was noticed in the days of Weizenbaum
§ Agents may record sensitive data
§ (e.g. “Computer, turn on the lights [an-swers the phone –Hi, yes, my
password is...”],
§ Which may then be used to train a seq2seq conversational
model.
§ Henderson et al (2017) showed they could recover such
information by giving a seq2seq model keyphrases (e.g.,
"password is")
haithem.afli@cit.ie 83
Ethical Issues in Dialog System Design: Gender
equality
§ Dialog agents overwhelmingly given female names,
perpetuating female servant stereotype(Paolino, 2017).
§ Responses from commercial dialog agents when users use
sexually harassing language (Fessler 2017):
haithem.afli@cit.ie 84
Speech and Language Processing (3rd ed. draft)
Dan Jurafsky and James H. Martin
Addressing real-world challenges
§ AI Technologies
- Natural Language Processing (NLP)
- Social Media and UGC Analysis
- Computer Vision (CV)
- Machine/Deep Learning (ML-DL)
§ Applications
- Digital Humanities
- Fintech
- Digital Health and Life-science
- Social Science and Psychology
- Security and Cybersecurity
85haithem.afli@cit.ie
http://www.cit.ie
Computer Science Department
Haithem. afli@cit.ie
@AfliHaithem
Thank you

More Related Content

Similar to Introduction to Natural Language Processing

My Dream Vacation Essay Paris
My Dream Vacation Essay ParisMy Dream Vacation Essay Paris
My Dream Vacation Essay ParisAndrea Lawson
 
Pythonlearn-01-Intro.pptx
Pythonlearn-01-Intro.pptxPythonlearn-01-Intro.pptx
Pythonlearn-01-Intro.pptxMrHackerxD
 
How Was Your Weekend?
How Was Your Weekend?How Was Your Weekend?
How Was Your Weekend?Ben Seymour
 
Ielts Essay Letter Topics
Ielts Essay Letter TopicsIelts Essay Letter Topics
Ielts Essay Letter TopicsJackie Jones
 
Inclusive design principles for WordPress
Inclusive design principles for WordPressInclusive design principles for WordPress
Inclusive design principles for WordPressJoe Ortenzi
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data lokku
 
How to create effective smartphone video | Scotland Networking Group | 12 Oct...
How to create effective smartphone video | Scotland Networking Group | 12 Oct...How to create effective smartphone video | Scotland Networking Group | 12 Oct...
How to create effective smartphone video | Scotland Networking Group | 12 Oct...CharityComms
 
A Brief Guide to IT Project Management
A Brief Guide to IT Project Management A Brief Guide to IT Project Management
A Brief Guide to IT Project Management Habermann Frank
 
Narrative Essay If I Were Invisible
Narrative Essay If I Were InvisibleNarrative Essay If I Were Invisible
Narrative Essay If I Were InvisiblePatty Loen
 
Public speaking - FDP tech leads summit - 2018-04-30
Public speaking - FDP tech leads summit - 2018-04-30Public speaking - FDP tech leads summit - 2018-04-30
Public speaking - FDP tech leads summit - 2018-04-30FrĂŠdĂŠric Harper
 
BAQMaR - Conference Evening
BAQMaR - Conference EveningBAQMaR - Conference Evening
BAQMaR - Conference EveningBAQMaR
 
Culture management distribution
Culture management distributionCulture management distribution
Culture management distributionSamitha Jayaweera
 
How To Write A Good Cause And Effect Essay
How To Write A Good Cause And Effect EssayHow To Write A Good Cause And Effect Essay
How To Write A Good Cause And Effect EssayJennifer Perry
 
Violent Video Games Effects Argumentative Essay
Violent Video Games Effects Argumentative EssayViolent Video Games Effects Argumentative Essay
Violent Video Games Effects Argumentative EssayDiana Hole
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-PersonChris Fregly
 
Writing for the Web, It's Not the Same!
Writing for the Web, It's Not the Same!Writing for the Web, It's Not the Same!
Writing for the Web, It's Not the Same!Charles Crouch
 
Use Of Math In Daily Life Essay
Use Of Math In Daily Life EssayUse Of Math In Daily Life Essay
Use Of Math In Daily Life EssayTrina Martin
 
Presentation Visuals
Presentation VisualsPresentation Visuals
Presentation Visualsbthat
 
English Essay Writing Help Online From Professional Essay Tutors
English Essay Writing Help Online From Professional Essay TutorsEnglish Essay Writing Help Online From Professional Essay Tutors
English Essay Writing Help Online From Professional Essay TutorsJennifer Wright
 

Similar to Introduction to Natural Language Processing (20)

My Dream Vacation Essay Paris
My Dream Vacation Essay ParisMy Dream Vacation Essay Paris
My Dream Vacation Essay Paris
 
Pythonlearn-01-Intro.pptx
Pythonlearn-01-Intro.pptxPythonlearn-01-Intro.pptx
Pythonlearn-01-Intro.pptx
 
How Was Your Weekend?
How Was Your Weekend?How Was Your Weekend?
How Was Your Weekend?
 
Ielts Essay Letter Topics
Ielts Essay Letter TopicsIelts Essay Letter Topics
Ielts Essay Letter Topics
 
Inclusive design principles for WordPress
Inclusive design principles for WordPressInclusive design principles for WordPress
Inclusive design principles for WordPress
 
A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data  A living hell - lessons learned in eight years of parsing real estate data
A living hell - lessons learned in eight years of parsing real estate data
 
The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019
 
How to create effective smartphone video | Scotland Networking Group | 12 Oct...
How to create effective smartphone video | Scotland Networking Group | 12 Oct...How to create effective smartphone video | Scotland Networking Group | 12 Oct...
How to create effective smartphone video | Scotland Networking Group | 12 Oct...
 
A Brief Guide to IT Project Management
A Brief Guide to IT Project Management A Brief Guide to IT Project Management
A Brief Guide to IT Project Management
 
Narrative Essay If I Were Invisible
Narrative Essay If I Were InvisibleNarrative Essay If I Were Invisible
Narrative Essay If I Were Invisible
 
Public speaking - FDP tech leads summit - 2018-04-30
Public speaking - FDP tech leads summit - 2018-04-30Public speaking - FDP tech leads summit - 2018-04-30
Public speaking - FDP tech leads summit - 2018-04-30
 
BAQMaR - Conference Evening
BAQMaR - Conference EveningBAQMaR - Conference Evening
BAQMaR - Conference Evening
 
Culture management distribution
Culture management distributionCulture management distribution
Culture management distribution
 
How To Write A Good Cause And Effect Essay
How To Write A Good Cause And Effect EssayHow To Write A Good Cause And Effect Essay
How To Write A Good Cause And Effect Essay
 
Violent Video Games Effects Argumentative Essay
Violent Video Games Effects Argumentative EssayViolent Video Games Effects Argumentative Essay
Violent Video Games Effects Argumentative Essay
 
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
15 Tips to Scale a Large AI/ML Workshop - Both Online and In-Person
 
Writing for the Web, It's Not the Same!
Writing for the Web, It's Not the Same!Writing for the Web, It's Not the Same!
Writing for the Web, It's Not the Same!
 
Use Of Math In Daily Life Essay
Use Of Math In Daily Life EssayUse Of Math In Daily Life Essay
Use Of Math In Daily Life Essay
 
Presentation Visuals
Presentation VisualsPresentation Visuals
Presentation Visuals
 
English Essay Writing Help Online From Professional Essay Tutors
English Essay Writing Help Online From Professional Essay TutorsEnglish Essay Writing Help Online From Professional Essay Tutors
English Essay Writing Help Online From Professional Essay Tutors
 

More from Haithem Afli

How NLP is reshaping Fintech
How NLP is reshaping Fintech How NLP is reshaping Fintech
How NLP is reshaping Fintech Haithem Afli
 
Looking Beyond the AI & IoT Research and Industrial Opportunities: How two Br...
Looking Beyond the AI & IoTResearch and Industrial Opportunities:How two Br...Looking Beyond the AI & IoTResearch and Industrial Opportunities:How two Br...
Looking Beyond the AI & IoT Research and Industrial Opportunities: How two Br...Haithem Afli
 
AI Meets Digital Health, Social Science and AgriTech
AI Meets Digital Health, Social Science and AgriTechAI Meets Digital Health, Social Science and AgriTech
AI Meets Digital Health, Social Science and AgriTechHaithem Afli
 
Affective Analytics and Visualization for Ensemble event-driven stock market ...
Affective Analytics and Visualization for Ensemble event-driven stock market ...Affective Analytics and Visualization for Ensemble event-driven stock market ...
Affective Analytics and Visualization for Ensemble event-driven stock market ...Haithem Afli
 
Natural Language Engineering in the Golden Age of Artificial Intelligence
 Natural Language Engineering in the Golden Age of Artificial Intelligence Natural Language Engineering in the Golden Age of Artificial Intelligence
Natural Language Engineering in the Golden Age of Artificial IntelligenceHaithem Afli
 
Industrial Internet Consortium 2019
Industrial Internet Consortium 2019Industrial Internet Consortium 2019
Industrial Internet Consortium 2019Haithem Afli
 
Analytics2017
Analytics2017Analytics2017
Analytics2017Haithem Afli
 
Présentation de thèse Haithem AFLI
Présentation de thèse Haithem AFLIPrésentation de thèse Haithem AFLI
Présentation de thèse Haithem AFLIHaithem Afli
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaHaithem Afli
 

More from Haithem Afli (9)

How NLP is reshaping Fintech
How NLP is reshaping Fintech How NLP is reshaping Fintech
How NLP is reshaping Fintech
 
Looking Beyond the AI & IoT Research and Industrial Opportunities: How two Br...
Looking Beyond the AI & IoTResearch and Industrial Opportunities:How two Br...Looking Beyond the AI & IoTResearch and Industrial Opportunities:How two Br...
Looking Beyond the AI & IoT Research and Industrial Opportunities: How two Br...
 
AI Meets Digital Health, Social Science and AgriTech
AI Meets Digital Health, Social Science and AgriTechAI Meets Digital Health, Social Science and AgriTech
AI Meets Digital Health, Social Science and AgriTech
 
Affective Analytics and Visualization for Ensemble event-driven stock market ...
Affective Analytics and Visualization for Ensemble event-driven stock market ...Affective Analytics and Visualization for Ensemble event-driven stock market ...
Affective Analytics and Visualization for Ensemble event-driven stock market ...
 
Natural Language Engineering in the Golden Age of Artificial Intelligence
 Natural Language Engineering in the Golden Age of Artificial Intelligence Natural Language Engineering in the Golden Age of Artificial Intelligence
Natural Language Engineering in the Golden Age of Artificial Intelligence
 
Industrial Internet Consortium 2019
Industrial Internet Consortium 2019Industrial Internet Consortium 2019
Industrial Internet Consortium 2019
 
Analytics2017
Analytics2017Analytics2017
Analytics2017
 
Présentation de thèse Haithem AFLI
Présentation de thèse Haithem AFLIPrésentation de thèse Haithem AFLI
Présentation de thèse Haithem AFLI
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Introduction to Natural Language Processing

  • 1. HAL 9000 – (3 min) haithem.afli@cit.ie 1 https://www.youtube.com/watch?v=ARJ8cAGm6JE
  • 2. http://www.cit.ie Computer Science Department Haithem. afli@cit.ie @AfliHaithem Natural Language Processing and its applications Dr Haithem Afli May 7th , 2020 Online Lecture
  • 3. Dr Haithem Afli - Background § Computer Science Lecturer at CIT (J102) - NLP, Data Analytics and ML § Science Foundation Ireland Funded Investigator - Leader of ADAPT@CIT research group Research Interest: - Natural Language Processing - Social Media and UGC Analysis - Machine Translation - Data Analytics § Lecturing Experience (10+ years) - 5 years in France - 3 years in DCU - 2 years in CIT 05/05/2020 3
  • 4. Overview § Introduction to NLP § Language Modeling and its applications § The Golden Age of AI § Language Technologies § Ethical Issues (the case of Dialog systems) haithem.afli@cit.ie 4
  • 5. Language § “Language, a system of conventional spoken, manual (signed), or written symbols by means of which human beings, as members of a social group and participants in its culture, express themselves. § The functions of language include communication, the expression of identity, play, imaginative expression, and emotional release. “ Britanica haithem.afli@cit.ie 5
  • 6. If you think the language industry is new haithem.afli@cit.ie 6
  • 7. If you think the language industry is new, think again! haithem.afli@cit.ie 7 Rosetta Stone (British Museum)
  • 8. Natural Language : An age-old industry ? § For as far back as we can see, human has needed to communicate → so the origin of language industry is closely intertwined with the need of communication itself haithem.afli@cit.ie 8 The Tower of Babel and The House of Wisdom in Bagdad (Bait-al-Hikma)
  • 9. The importance of Language Processing in modern history haithem.afli@cit.ie 9 Media agencies and translators interpreted the word “treat with silent contempt” or “take into account” (to ignore), as the categorical rejection by the Prime Minister. The Americans understood that there would never be a diplomatic end to the war and were naturally annoyed by what they considered the arrogant tone used in the Japanese translation of the Prime Minister’s response. International news agencies reported to the world that in the eyes of the Japanese government the ultimatum was “not worthy of comment.”
  • 11. http://sma.adaptcentre.ie/ge16/#!/ Social Media Analysis Haithem Afli, Sorcha McGuire, and Andy Way. 2017. Sentiment translation for low resourced languages: Experiments on irish general election tweets. In 18th International Conference on Computational Linguistics and Intelligent Text Processing. haithem.afli@cit.ie 11
  • 12. Information Extraction & Sentiment Analysis § nice and compact to carry! § since the camera is small and light, I won't need to carry around those heavy, bulky professional cameras either! § the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera Size and weight Attributes: zoom affordability size and weight flash ease of use ✓ ✗ ✓ haithem.afli@cit.ie 12
  • 14. Requested translation from Twitter (words) Grand Total from all World Cup matches 6,459,830 5,141,360 4,847,590 85,047,110 • SourceĂ Target traffic: • ENĂ ES 13,614,450 (EN to all languages: 50,545,460) • ESĂ EN 5,569,200 (ES to all languages: 10,609,420) • PTĂ EN 1,831,750 (PT to all languages: 4,230,880) The 2014 FIFA World Cup was the biggest event yet for Twitter with 672 million tweets English Portuguese SpanishTop 3 languages UGC Machine Translation - Braziliator haithem.afli@cit.ie 14
  • 15. Requested translation from Twitter (words) Grand Total from all World Cup matches 6,459,830 5,141,360 4,847,590 85,047,110 • SourceĂ Target traffic: • ENĂ ES 13,614,450 (EN to all languages: 50,545,460) • ESĂ EN 5,569,200 (ES to all languages: 10,609,420) • PTĂ EN 1,831,750 (PT to all languages: 4,230,880) The 2014 FIFA World Cup was the biggest event yet for Twitter with 672 million tweets English Portuguese SpanishTop 3 languages UGC Machine Translation - Braziliator haithem.afli@cit.ie 15
  • 16. UI: Sentiment pitch Final: Germany 1-0 Argentina 3rd Place: Netherlands 3-0 Brazil Semi-final: Argentina 1-0 NetherlandsSemi-final: Germany 7-1 Brazil UGC Machine Translation - Braziliator 16
  • 17. UGC Machine Translation - Braziliator 17 Semi-final: Germany 7-1 Brazil
  • 18. Now if we return to HAL 9000 haithem.afli@cit.ie 18 https://www.youtube.com/watch?v=ARJ8cAGm6JE
  • 19. HAL: What’s needed? § Speech recognition and synthesis § Knowledge of the English words involved § What they mean? § How groups of words form a sentence § How can we define a language? haithem.afli@cit.ie 19
  • 22. What is a language? Can we define a language mathematically? Deterministic Definition: A language is the set of all the sentences we can say. Probabilistic Definition: A language is the probabilistic distribution of all possible sentences Ø Statistical Language Model haithem.afli@cit.ie 22
  • 23. Statistical Language Model Defined as The normalization condition haithem.afli@cit.ie 23
  • 24. Statistical Language Model • How can we estimate the probability of a sentence in a specific language? • Unlike estimating the probability distribution of a dice, we cannot exhaust all the possible sentences in limited samples haithem.afli@cit.ie 24
  • 25. Statistical Language Model • How can we estimate the probability of a sentence in a specific language? • Unlike estimating the probability distribution of a dice, we cannot exhaust all the possible sentences in limited sample • Idea - break all sentences down to limited substrings (n-grams) - Estimate the probability of a sentence by these substrings l If a sentence has many plausible substrings then it might be a reasonable sentence haithem.afli@cit.ie 25
  • 26. Simplest Language Model • Simplest way to break down a sentence - split it to words • Thus, the simplest language model • Here the probability of a sentence is just the multiplication of the probability of the words in the sentence • This model is called unigram language model haithem.afli@cit.ie 26
  • 27. Word Frequency • p(w) is word frequency Type Occurrences Rank the 3789654 1st he 2098762 2nd [...] king 57897 1,356th boy 56975 1,357th [...] stringyfy 5 34,589th [...] transducionalify 1 123,567th p(w)= occurrences of w number of tokens haithem.afli@cit.ie 27
  • 30. 1990s-2010s: Statistical Machine Translation • Question: How to learn translation model ? • First, need large amount of parallel data (e.g. pairs of human-translated French/English sentences) haithem.afli@cit.ie
  • 31. 1990s-2010s: Statistical Machine Translation • Question: How to learn translation model ? • First, need large amount of parallel data (e.g. pairs of human-translated French/English sentences) • Break it down further: we actually want to consider where a is the alignment, i.e. word-level correspondence between French sentence x and English sentence y haithem.afli@cit.ie
  • 32. How does SMT Work? haithem.afli@cit.ie 32 Fei-Fei Li
  • 33. Neural MT § Paradigm Shift: Machine Learning Ă  Deep Learning § Hardware: CPU Ă  GPU § Open-Source Tools: § Theano § Tensorflow § Pytorch § mxnet § Slower Training & Decoding haithem.afli@cit.ie 33
  • 34. HAL: What’s needed? § Dialog § It is polite to respond, even if you re planning to kill someone. § It is polite to pretend to want to be cooperative (I m afraid, I can t…) haithem.afli@cit.ie 34
  • 35. Conversational Agents AKA Dialog Agents § Phone-based Personal Assistants SIRI, Alexa, Cortana, Google Assistant § Talking to your car § Communicating with robots § Clinical uses for mental health § Chatting for fun haithem.afli@cit.ie 35
  • 36. Two classes of systems 1. Chatbots 2. (Goal-based) Dialog agents - SIRI, interfaces to cars, robots, - booking flights or restaurants The word "chatbots" is sometimes used in the popular press for both. We'll use it only for #1. haithem.afli@cit.ie 36
  • 38. But before Weobot .. haithem.afli@cit.ie 38
  • 39. Eliza: Weizenbaum (1966) Men are all alike. IN WHAT WAY They're always bugging us about something or other. CAN YOU THINK OF A SPECIFIC EXAMPLE Well, my boyfriend made me come here. YOUR BOYFRIEND MADE YOU COME HERE He says I'm depressed much of the time. I AM SORRY TO HEAR YOU ARE DEPRESSED ... WHO ELSE IN YOUR FAMILY TAKES CARE 0F YOU? My father YOUR FATHER haithem.afli@cit.ie 39
  • 40. You are like my father in some ways. WHAT RESEMBLANCE DO YOU SEE You are not very aggressive but I think you don’t want me to notice that. WHAT MAKES YOU THINK I AM NOT AGGRESSIVE You don’t argue with me. WHY DO YOU THINK I DON’T ARGUE WITH YOU You are afraid of me. DOES IT PLEASE YOU TO BELIEVE I’M AFRAID OF YOU My father is afraid of everybody. WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER Bullies. DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE How does it work? Eliza uses this one weird trick haithem.afli@cit.ie 40
  • 41. That trick: be a Rogerian psychologist haithem.afli@cit.ie 41
  • 42. That trick: be a Rogerian psychologist § Draw the patient out by reflecting patient’s statements back at them § Rare type of conversation in which one can “assume the pose of knowing almost nothing of the real world” haithem.afli@cit.ie 42
  • 43. Eliza pattern/transform rules (0 YOU 0 ME) [pattern] Ă  (WHAT MAKES YOU THINK I 3 YOU) [transform] 0 means Kleene star * The 3 is the constituent # in pattern You hate me WHAT MAKES YOU THINK I HATE YOU haithem.afli@cit.ie 43 Dan Jurafsky
  • 44. Some implications § People became deeply emotionally involved with the program § Weizenbaum tells the story of his secretary who would ask Weizenbaum to leave the room when she talked with ELIZA § When he suggested that he might want to store all the ELIZA conversations for later analysis, people immediately pointed out the privacy implications § Suggesting that they were having quite private conversations with ELIZA § Anthropomorphicism and the Heider-Simmel Illusion § https://www.youtube.com/watch?v=8FIEZXMUM2I haithem.afli@cit.ie 44
  • 45. Components of current SIRI-style architectures Interaction Model Speech Synthesis Output to User Speech Synthesis Elicitation Interaction Context World Knowledge Word Sequence Input from User Speech Recognition Semantic Interpretation LPM Training NL Under- standing Clarifying Question Dialog Management Missing Elements incomplete Best Outcome Inferred User Intent Action Selection complete ? Figure from Jerome Bellegarda haithem.afli@cit.ie 45
  • 46. NLP in the Golden Age of AI NLP has an AI aspect to it. § We’re often dealing with ill-defined problems § We don’t often come up with exact solutions/algorithms § We can’t let either of those facts get in the way of making progress haithem.afli@cit.ie 46
  • 47. Artificial intelligence (AI) Beyond the Hype haithem.afli@cit.ie 47 Graph from Tobias Bohnhoff https://nativevideotube.blogspot.com/
  • 48. haithem.afli@cit.ie 48 NLP - the language industry
  • 49. The Rise of Natural Language Processing (NLP), and How it is Changing the Way we Retrieve Information haithem.afli@cit.ie 49 The 'creator' of Bitcoin, Satoshi Nakamoto, is the world's most elusive billionaire. Very few people outside of the Department of Homeland Security know Satoshi's real name. Satoshi has taken great care to keep his identity secret employing the latest encryption and obfuscation methods in his communications. Despite these efforts Satoshi Nakamoto gave investigators the only tool they needed to find him - - his own words. Using NLP, NSA (and everyone!) was able to compare texts to determine authorship of a particular work. More info: https://tech.slashdot.org/story/17/08/28/1725232/how-the-nsa-identified-satoshi- nakamoto
  • 50. Timeline of (modern) AI haithem.afli@cit.ie Graph from The University Of Queensland Brain Institute The 1st AI Winter The second AI Winter Including CIT MSc in AI https://www.cit.ie/course/CRKARIN9 50
  • 51. The first AI winter haithem.afli@cit.ie By 1964, the National Research Council (NRC) had become concerned about the lack of progress and formed the Automatic Language Processing Advisory Committee (ALPAC) to look into the problem. They concluded, in a famous 1966 report, that machine translation was more expensive, less accurate and slower than human translation. After spending some 20 million dollars, the NRC ended all support. Image from Wikipedia 51
  • 52. haithem.afli@cit.ie In 1984, John McCarthy criticized expert systems because they lacked common sense and knowledge about their own limitations. Schwarz, Director of DARPA ISTO from 1987 to 1989 concluded that AI research has always had “… very limited success in particular areas, followed immediately by failure to reach the broader goal at which these initial successes seem at first to hint…”. Ø Decrease in funding in AI research. Ø Many AI companies closed their doors. Ø The AAAI conference that attracted over 6000 visitors in 1986 quickly decreased to just 2000 by 1991. The second AI winter 52
  • 53. The survivors The Deep Learning God Fathers haithem.afli@cit.ie Turing Award given for: • “The conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.” 53
  • 55. 2014: Generative Adversarial Networks § The neural network at the top is the discriminator, and its task is to distinguish the training set’s real information from the generator’s creations. § In the simplest GAN structure, the generator starts with random data and learns to transform this noise into information that matches the distribution of the real data. haithem.afli@cit.ie 55
  • 56. Do you know this person? haithem.afli@cit.ie https://thispersondoesnotexist.com/ 56
  • 60. DeepFake § The development of deepfakes has taken place to a large extent in two settings: research at academic institutions, and development by amateurs in online communities. haithem.afli@cit.ie 60
  • 61. GAN Applications of GANs ØGANs for Image Editing ØUsing GANs for Security (SSGAN: Secure Steganography Based on GAN) ØDe-aging Robert De Niro! (Martin Scorsese spent millions of Netflix's money to digitally de-age De Niro, Pacino, and Pesci so they could portray these men throughout different parts of their lives.) haithem.afli@cit.ie 61
  • 63. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ Why else is natural language understanding difficult? haithem.afli@cit.ie 63
  • 64. Challenges in translating User-Generated Content haithem.afli@cit.ie 64
  • 65. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues the New York-New Haven Railroad the New York-New Haven Railroad Why else is natural language understanding difficult? haithem.afli@cit.ie 65
  • 66. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues idioms dark horse get cold feet lose face throw in the towel the New York-New Haven Railroad the New York-New Haven Railroad Why else is natural language understanding difficult? haithem.afli@cit.ie 66
  • 67. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues idioms dark horse get cold feet lose face throw in the towel neologisms unfriend Retweet bromance the New York-New Haven Railroad the New York-New Haven Railroad Why else is natural language understanding difficult? haithem.afli@cit.ie 67
  • 68. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues idioms dark horse get cold feet lose face throw in the towel neologisms unfriend Retweet bromance But that’s what makes it fun! the New York-New Haven Railroad the New York-New Haven Railroad Why else is natural language understanding difficult? haithem.afli@cit.ie 68
  • 69. Making progress on this problem… § The task is difficult! What tools do we need? § Knowledge about language § Knowledge about the world § A way to combine knowledge sources § How we generally do this: § Probabilistic models built from language data § P(“maison” ÂŽ “house”) high § P(“L’avocat gĂŠnĂŠral” ÂŽ “the general avocado”) low § Luckily, rough text features can often do half the job. haithem.afli@cit.ie 69 Dan Jurafsky and James H. Martin Ø Pre-trained models
  • 70. Pre-trained models: BERT haithem.afli@cit.ie BERT makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text. 70
  • 71. From BERT to ALBERT haithem.afli@cit.ie 71 • BERT (Google) • XLNet (Google/CMU) • RoBERTa (Facebook) • DistilBERT (HuggingFace) • CTRL (Salesforce) • GPT-2 (OpenAI) • Megatron (NVIDIA) • ALBERT (Google)
  • 75. Challenges with automatically generated texts haithem.afli@cit.ie 75
  • 76. Addressing commensense problem haithem.afli@cit.ie 76 Cunxiang Wang, Shuailong Liang , Yue Zhang , Xiaonan Li and Tian Gao. Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation.
  • 77. Language Technology Coreference resolution Question answering (QA) Part-of-speech (POS) tagging Word sense disambiguation (WSD) Paraphrase Named entity recognition (NER) Parsing Summarization Information extraction (IE) Machine translation (MT) Dialog Sentiment analysis mostly solved making good progress still really hard Spam detection Let’s go to Agra! Buy V1AGRA … ✓ ✗ Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV Einstein met with UN officials in Princeton PERSON ORG LOC You’re invited to our dinner party, Friday May 27 at 8:30 Party May 27 add Best roast chicken in San Francisco! The waiter ignored us for 20 minutes. Carter told Mubarak he shouldn’t run again. I need new batteries for my mouse. The 13th Shanghai International Film Festival… 13 … The Dow Jones is up Housing prices rose Economy is good Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? I can see Alcatraz from the window! XYZ acquired ABC yesterday ABC has been taken over by XYZ Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket? The S&P500 jumped haithem.afli@cit.ie 77
  • 78. Real Success: IBM’s Watson § Won Jeopardy on February 16, 2011! WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF WALLACHIA AND MOLDOVIA” INSPIRED THIS AUTHOR’S MOST FAMOUS NOVEL Bram Stoker haithem.afli@cit.ie 78
  • 79. Real Success: Watson on Jeopardy § https://www.youtube.com/watch?v=WFR3lOm_xhE haithem.afli@cit.ie 79
  • 80. Ethical Issues in Dialog System Design § Machine learning systems replicate biases that occurred in the training data. § Microsoft's Tay chatbot § Went live on Twitter in 2016 § Taken offline 16 hours later § In that time it had started posting racial slurs, conspiracy theories, and personal attacks § Learned from user interactions (Neff and Nagy 2016) The Twitter profile picture of Tay haithem.afli@cit.ie 80
  • 82. Ethical Issues in Dialog System Design § Machine learning systems replicate biases that occurred in the training data. § Dialog datasets § Henderson et al. (2017) examined standard datasets (Twitter, Reddit, movie dialogs) § Found examples of hate speech, offensive language, and bias § Both in the original training data, and in the output of chatbots trained on the data. haithem.afli@cit.ie 82
  • 83. Ethical Issues in Dialog System Design: Privacy § Remember this was noticed in the days of Weizenbaum § Agents may record sensitive data § (e.g. “Computer, turn on the lights [an-swers the phone –Hi, yes, my password is...”], § Which may then be used to train a seq2seq conversational model. § Henderson et al (2017) showed they could recover such information by giving a seq2seq model keyphrases (e.g., "password is") haithem.afli@cit.ie 83
  • 84. Ethical Issues in Dialog System Design: Gender equality § Dialog agents overwhelmingly given female names, perpetuating female servant stereotype(Paolino, 2017). § Responses from commercial dialog agents when users use sexually harassing language (Fessler 2017): haithem.afli@cit.ie 84 Speech and Language Processing (3rd ed. draft) Dan Jurafsky and James H. Martin
  • 85. Addressing real-world challenges § AI Technologies - Natural Language Processing (NLP) - Social Media and UGC Analysis - Computer Vision (CV) - Machine/Deep Learning (ML-DL) § Applications - Digital Humanities - Fintech - Digital Health and Life-science - Social Science and Psychology - Security and Cybersecurity 85haithem.afli@cit.ie
  • 86. http://www.cit.ie Computer Science Department Haithem. afli@cit.ie @AfliHaithem Thank you