This document provides an overview of statistical natural language processing (NLP). It begins with introducing the speaker, Mona Diab, and their research interests in NLP. It then discusses the growing amount of digital data being produced and the potential for machines to process and understand human language. However, language is complex with ambiguity, and good NLP solutions require both linguistic and machine learning knowledge. The document outlines some of the goals and challenges of NLP, including resolving ambiguity, and provides examples of NLP applications and techniques like probabilistic models built from language data.
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
Provides a basic introduction to Natural Language Processing (NLP), its properties, and some common techniques such as stemming, tokenization, bag-of-words, stripping, and n-grams
Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
myassignmenthelp is premier service provider for NLP related assignments and projects. Given PPT describes processes involved in NLP programming.so whenever you need help in any work related to natural language processing feel free to get in touch with us.
NLP is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language. Also called Computational Linguistics – Also concerns how computational methods can aid the understanding of human language
Provides a basic introduction to Natural Language Processing (NLP), its properties, and some common techniques such as stemming, tokenization, bag-of-words, stripping, and n-grams
Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
Natural Language Processing for Games ResearchJose Zagal
Extended version of talk given at GAMNLP Workshop - Kanazawa Japan 2012.
Presents earlier work analyzing game reviews using natural language processing techniques (first previewed at the Game Studies Research Seminar, Tampere Finland 2010)
Natural Language Processing(NLP) is a subset Of AI.It is the ability of a computer program to understand human language as it is spoken.
Contents
What Is NLP?
Why NLP?
Levels In NLP
Components Of NLP
Approaches To NLP
Stages In NLP
NLTK
Setting Up NLP Environment
Some Applications Of NLP
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
These slides are an introduction to the understanding of the domain NLP and the basic NLP pipeline that are commonly used in the field of Computational Linguistics.
Big Data and Natural Language ProcessingMichel Bruley
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
This presentation will help you to understand the basic concepts of Natural Language Processing With this you will understand the significance of Natural Language Processing in our daily life
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
( **Natural Language Processing Using Python: - https://www.edureka.co/python-natural... ** )
This PPT will provide you with detailed and comprehensive knowledge of the two important aspects of Natural Language Processing ie. Stemming and Lemmatization. It will also provide you with the differences between the two with Demo on each. Following are the topics covered in this PPT:
Introduction to Big Data
What is Text Mining?
What is NLP?
Introduction to Stemming
Introduction to Lemmatization
Applications of Stemming & Lemmatization
Difference between stemming & Lemmatization
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...Nicolas Kourtellis
A general overview of the APACHE SAMOA platform for mining big data streams using machine learning algorithms running on distributed stream processing platforms such as Apache STORM, Apache Flink, Apache Samza and Apache Apex.
Results are shown from experimentation with VHT, the Vertical Hoeffding Tree proposed in "VHT: Vertical Hoeffding Tree." N. Kourtellis, G. De Francisci Morales, A. Bifet, A. Mordupo. IEEE BigData 2016.
Presentation in APACHE BIG DATA North America 2016
Natural Language Processing: L01 introductionananth
This presentation introduces the course Natural Language Processing (NLP) by enumerating a number of applications, course positioning, challenges presented by Natural Language text and emerging approaches to topics like word representation.
Natural Language Processing for Games ResearchJose Zagal
Extended version of talk given at GAMNLP Workshop - Kanazawa Japan 2012.
Presents earlier work analyzing game reviews using natural language processing techniques (first previewed at the Game Studies Research Seminar, Tampere Finland 2010)
Natural Language Processing(NLP) is a subset Of AI.It is the ability of a computer program to understand human language as it is spoken.
Contents
What Is NLP?
Why NLP?
Levels In NLP
Components Of NLP
Approaches To NLP
Stages In NLP
NLTK
Setting Up NLP Environment
Some Applications Of NLP
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
These slides are an introduction to the understanding of the domain NLP and the basic NLP pipeline that are commonly used in the field of Computational Linguistics.
Big Data and Natural Language ProcessingMichel Bruley
Natural Language Processing (NLP) is the branch of computer science focused on developing systems that allow computers to communicate with people using everyday language.
This presentation will help you to understand the basic concepts of Natural Language Processing With this you will understand the significance of Natural Language Processing in our daily life
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
( **Natural Language Processing Using Python: - https://www.edureka.co/python-natural... ** )
This PPT will provide you with detailed and comprehensive knowledge of the two important aspects of Natural Language Processing ie. Stemming and Lemmatization. It will also provide you with the differences between the two with Demo on each. Following are the topics covered in this PPT:
Introduction to Big Data
What is Text Mining?
What is NLP?
Introduction to Stemming
Introduction to Lemmatization
Applications of Stemming & Lemmatization
Difference between stemming & Lemmatization
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...Nicolas Kourtellis
A general overview of the APACHE SAMOA platform for mining big data streams using machine learning algorithms running on distributed stream processing platforms such as Apache STORM, Apache Flink, Apache Samza and Apache Apex.
Results are shown from experimentation with VHT, the Vertical Hoeffding Tree proposed in "VHT: Vertical Hoeffding Tree." N. Kourtellis, G. De Francisci Morales, A. Bifet, A. Mordupo. IEEE BigData 2016.
Presentation in APACHE BIG DATA North America 2016
GATE: a text analysis tool for social mediaDiana Maynard
Short tutorial about how and why to use GATE for text analysis of social media, given at the Big Social Data workshop at Reading University in April 2015.
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
BigDL is a distributed deep Learning framework built for Big Data platform using Apache Spark. It combines the benefits of “high performance computing” and “Big Data” architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speedup than out-of-box open source DL frameworks (e.g., Caffe/Torch) wrt single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture. We’ll also share how our users adopt BigDL for their deep learning applications (such as image recognition, object detection, NLP, etc.), which allows them to use their Big Data (e.g., Apache Hadoop and Spark) platform as the unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
A presentation made by our CEO, Dr. Mathew McDougall at the G20 Global Cafe in Brisbane (Australia) on November 13th 2014. His presentation clearly outlined the use of technology for determining online audience segmentation and and in the use for content planning purposes. Overall, the technological approaches through big data leads to more effective social media and content programs.
The Rise Of Conversational AI with David LowDatabricks
In this fast-paced world, customers demand ease and efficiency when they talk to a company. Here comes “Chatbot”, an automated conversational agent which conducts conversations via text or voice.
For this talk, I will be starting with the current state of Conversational Intelligence and some common Python libraries used in building chatbots. Various approaches of building conversation engine such as pattern matching, word embedding and long short-term memory (LSTM) models will be discussed. At the same time, I will present the next generation of Conversational AI that focuses on Question Answering and perform a live-demo of such a system.
After the demo, the inner workings will be explained and relevant resources (including a Python framework for conversational AI research and datasets) will be introduced to the participants. Lastly, I will also share my experience launching commercial chatbots with Fortune 500 clients and a few pitfalls one should be aware of before concluding the talk. Key takeaways:
• Get to know the current state of Conversational AI
• Approaches from pattern matching to generative model; AIML / regex, Word-embeddings, Bi-directional LSTM
• The next generation of conversational AI: ParlAI, SQuAD, bAbi Tasks, MCTest etc. Example architecture: BiDAF, Dynamic Memory Network
• What we learned after launching 3 commercial chatbots with a bank and two insurance companies.
• Challenges: Compliance / General Intelligence / Answer Generation
• A Good Conversational UX is the combination of Art and Science.
NLP
Machine learning
is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate speech. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
is an interdisciplinary subfield of computer science and linguistics. It is primarily concerned with giving computers the ability to support and manipulate speech. It involves processing natural language datasets, such as text corpora or speech corpora, using either rule-based or probabilistic (i.e. statistical and, most recently, neural network-based) machine learning approaches. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. Who am I?
• Prof in CS department working on issues
of big data, data science, natural
language processing
• mtdiab@gwu.edu
• Check out my research @
– www.seas.gwu.edu/~mtdiab
• NLP lab @gw
– Care4lang1.seas.gwu.edu
3. “Every 2 days we produce as much
information as we did from the beginning of
time till 2003”
“Big Data refers to our ability to make use
of the ever-increasing volumes of data.”
“…everything we do is increasingly leaving a
digital trace (or data), which we (and others)
can use and analyze.”
Bernard Marr
4. The Dream
• It’d be great if machines could
• Process our email (usefully)
• Translate languages accurately
• Help us manage, summarize, and
aggregate information
• Use speech as a UI (when
needed)
• Talk to us / listen to us
• But they can’t:
• Language is complex, ambiguous,
flexible, and subtle
• Good solutions need linguistics
and machine learning
knowledge
Slide courtesy of Heng Ji
5. Heterogeneous BigData
MartinLockheed
3,000 workers
to furlough
amid
#USGovernmentShutdown
The Patient Protection and Affordable
Care Act (PPACA),[1] commonly
called the Affordable Care Act (ACA)
or Obamacare, is a United States
federal statute signed into law by
President Barack Obama on March
23, 2010.
The U.S. Congress, still in partisan
deadlock over Republican efforts to
halt President Barack Obama's
healthcare reforms, was on the
verge of shutting down most of the
U.S. government starting on
Tuesday morning.
NSF and NIST are
temporarily closed
because the Government
entered a period of
partial shutdown.
President Obama's 70-minute White
House meeting late Wednesday
afternoon with congressional leaders
including House Speaker John
Boehner, did nothing to help end the
impasse.
6. Mystery
• What’s now impossible for computers (and any other
species) to do is effortless for humans
✕ ✕ ✓
8. What is NLP?
• Fundamental goal: deep understanding of broad language use
• not just string processing or keyword matching!
9. What is NLP/CL?
• NLP: Natural Language Processing
– Is the field of making computers process natural language
• Does process entail understand?
• CL: Computational Linguistics
– Is the field of using computers to understand (natural)
language
• Natural Language?
– Refers to the language spoken by people, e.g. English,
Japanese, Swahili, as opposed to artificial languages, like C++,
Java, etc.
10. What is NLP?
• Computers using and processing natural language input (data)
and producing useful information, could be natural language
output/or structured data
• Software that can recognize, analyze and generate text and
speech
• Typically NLP refers to processing unstructured data – text in
free form (unstructured text)
• Contrast to Structured data refers to information in “tables”
– Typically allows numerical range and exact match (for text)
queries, e.g.,Salary < 60000 AND Manager = Smith, should
return Turner, Ian
Employee Manager Salary
Smith, John David, Richard $80,000
Turner, Ian Smith, John $59,000
Huang, Chang Smith, John $69,000
11. 11
Unstructured (text) vs. structured
(database) data in 1996
0
20
40
60
80
100
120
140
160
Data volume Market Cap
Unstructured
Structured
12. 12
Unstructured (text) vs. structured
(database) data
0
20
40
60
80
100
120
140
160
Data volume Market Cap
Unstructured
Structured
13. Goals of NLP/CL
• Model Human Language Processing
• Analyze Human Language
• Facilitate Human Language Communication
via Automated Tools
15. Computers Lack Knowledge!
• Computers “see” text in English/Arabic/French
the same way you saw the previous slide!
• People have no trouble understanding language
– Common sense knowledge
– Reasoning capacity
– Experience
• However, Computers have
– No common sense knowledge
– No reasoning capacity
Unless we teach them!
16. Why Should You Care?
• An enormous amount of knowledge is now
available in machine readable form as
natural language text
• Conversational agents are becoming an
important form of human-computer
communication
• Much of human-human communication is
now mediated by computers
• Very cool stuff! And with lots of commercial
interest.
Adapted from Speech and Language Processing - Jurafsky and MarJn
17. Why NLP?
• Applications for
processing large
amounts of texts
(BIG DATA)
require NLP
expertise
• Classify text into categories
• Index and search large texts
• Automatic machine translation
• Speech understanding
– Understand phone conversations
• Information extraction
– Extract useful information from
resumes
• Automatic summarization
– Condense 1 book into 1 page
• Question answering
• Knowledge acquisition
• Text generation / dialogs
19. Why is NLP intriguing?
• NLP has an AI aspect to it
– We’re often dealing with ill-defined problems
– We don’t often come up with exact solutions/
algorithms
– We can’t let either of those facts get in the
way of making progress
20. NLP in CS taxonomy
Computers
Artificial Intelligence AlgorithmsDatabases Networking
Robotics SearchNatural Language Processing
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics Parsing
21. The Challenge
• Language is complex with infinite
possible constructions
• Good news is that there are patterns as
the symbol set is finite, but the
patterns are latent
• Abundance of raw data
22. Why is NLP hard? Some Headlines…
• Police Begin Campaign To Run Down Jaywalkers
• Iraqi Head Seeks Arms
• Enraged Cow Injures Farmer With Ax
• Teacher Strikes Idle Kids
• Squad Helps Dog Bite Victim
• Red Tape Holds Up New Bridges
• Hospitals Are Sued by 7 Foot Doctors
• Court to Try Shooting Defendant
• Local High School Dropouts Cut in Half
23. How can a machine understand
these differences?
• Get the cat with the gloves.
24. Ambiguous Spoken Example
I made her duck
• I cooked waterfowl for her
• I cooked the waterfowl that belongs to
her
• I created the ceramic duck she owns
• I caused her to quickly lower her head
• And more….
25. Example … continued!
I made her duck
maid
Eye
Speech
recognition
cook
create
Word Sense
Disambiguation
Syntactic parsing
Verb
noun
Part of
Speech
Tagging
26. Linguistics
• It is the study of the science of human
language
• How the mind comes up with language
27. Levels of Language Description
• 6 basic levels (more or less explicitly present in most theories):
– and beyond (pragmatics/logic/...)
– meaning (semantics)
– (surface) syntax
– morphology
– phonology
– phonetics/orthography
• Each level has an input and output representation
– output from one level is the input to the next (upper)
level
– sometimes levels might be skipped (merged) or split
28. The Steps in NLP
Discourse
Pragmatics
Semantics
Syntax
Morphology
**we can go up, down and up and
down and combine steps too!!
**every step is equally complex
29. The View: Ambiguity
• All 6 levels of linguistic knowledge require
resolving ambiguity
• Ambiguity results from the existence of
multiple possibilities for each level
30. Ambiguity
• Computational linguists are obsessed with ambiguity
• Ambiguity is a fundamental problem of computational
linguistics
• Resolving ambiguity is a crucial goal
38. Making progress on this problem…
• The task is difficult! What tools do we
need?
– Knowledge about language
– Knowledge about the world
– A way to combine knowledge sources
• How we generally do this:
– probabilistic models built from language data
• P(“maison” → “house”) high
• P(“L’avocat général” → “the general avocado”) low
– Luckily, rough text features can often do half
the job.
39. CL Toolkit
• Knowledge of Linguistics, i.e. NLPers call them features!!
• State Machines
– Finite state automata, transducers
• Formal Rule Systems
– Regular Grammars, Context Free Grammars
• Logic
– First order logic, predicate calculus
• Probability Theory
– Associating probabilities with the previous machinery
• Machine Learning Tools
– Learning automatically from representations, play a very important role in cases where
we don’t have good explanations of why things happen the way they do
• Performance Metrics
– Well defined evaluation metrics for different tasks
41. Models and Algorithms
• By models we mean the formalisms that
are used to capture the various kinds of
linguistic knowledge we need.
• Algorithms are then used to manipulate
the knowledge representations needed
to tackle the task at hand.
42. Models
• Finite state machines
• Linguistic Rules
• Markov models
• Alignment
• Vector space model of word and
document meaning
• Logical formalisms
• Network models
43. Algorithms
• Rule-based
– Symbolic Parsers and morphological
analyzers
– Finite state automata
• Probabilistic/statistical
– Learned from observation of (labeled) data
– Predicting new data based on old
– Machine learning
44. Algorithms
• Many of the algorithms that we’ll study will turn out to
be transducers; algorithms that take one kind of
structure as input and output another
• Unfortunately, ambiguity makes this process difficult
• This leads us to employ algorithms that are designed to
handle ambiguity of various kinds
• State-space search paradigm: To manage the problem
of making choices during processing when we lack
the information needed to make the right choice
45. Machine Learning
Machine learning based classifiers that are
trained to make decisions based on (implicitly
or explicitly modeled) features from context
Simple Classifiers:
Naïve Bayes
Logistic Regression (MaxEnt)
Decision Trees
Neural Networks
Sequence Models:
Hidden Markov Models
Maximum Entropy Markov Models
Conditional Random Fields
Recursive Neural Networks (RNNs, LSTMs)
46. Approaching the challenge
• Divide & Conquer
– Break the problem into smaller problems
• Throw state of the art techniques at
the smaller problems
• Keep your fingers crossed!!
47. NLP Categories
• Applications
• Word counters (wc in UNIX)
• Spell Checkers, grammar checkers
• Predictive Text on mobile handsets
• Machine Translation (MT)
• Information Retrieval (IR)
• Automatic Speech Recognition (ASR)
• Optical Character Recognition (OCR)
• Automatic Summarization, Speech Synthesis, etc.
• Enabling Technologies
– Tokenization
– Part-of-Speech Tagging
– Syntactic Parsing
– Lemmatization
– Word Sense Disambiguation, etc.
48. • Alan Turing was British pioneering
computer scientist, mathematician,
logician, and cryptanalyst. He is widely
considered the Father of Computer
Science.
• The movie Imitation Game is about him.
• The Turing test is a test of a machine's ability to exhibit
intelligent behavior equivalent to, or indistinguishable from, that
of a human. Turing proposed that a human evaluator would judge
natural language conversations between a human and a machine
that is designed to generate human-‐ like responses.
Turing Test
Courtesy of Nizar Habash
49. Current Real-World Applications
• Search: very large corpora, e.g. Google
• Information Extraction: relevant information to a task
• Sentiment analysis: restaurant or movie reviews
• Summarizing very large amounts of text or speech: e.g.
your email, the news, voicemail
• Translating between one language and another: e.g.
Google Translate, Babelfish
• Dialogue systems: e.g. chatbots, Amtrak’s ‘Julie’
• Question answering: e.g. IBM’s Watson Jeopardy!,
DARPA who/what/where…, Ask Jeeves
• Even more: speech processing, common sense
knowledge, text categorization, web monitoring, etc.
52. Machine Translation
• Basic types of Machine Translation
– Text to Text Machine Translations
– Speech to Speech Machine Translations
• To date, majority of approaches have
targeted rich language pairs (with lots of
automated resources) – No Swahili-German
systems
• Current approaches are statistical,
learning from existing translations (parallel
data collections)
• Reasonable performance due significant
funding
56. Blog Analytics
• Data-mining of blogs, discussion forums,
message boards, user groups, and other
forms of user generated media
– Product marketing information
– Political opinion tracking
– Social network analysis
– Buzz analysis (what’s hot, what topics are
people talking about right now).
57. Livejournal.com:
I, me, my on or after Sep 11, 2001
o30-n5
o16-o22
o2-o8
s24
s22
s20
s18
s16
s14
s12
B
7.2
7.0
6.8
6.6
6.4
6.2
6.0
5.8
Graph from Pennebaker slides
Cohn, Mehl, Pennebaker. 2004. LinguisJc markers of psychological change surrounding September
11, 2001. Psychological Science 15, 10: 687-693.
58. September 11 LiveJournal.com study:
We, us, our
o30-n5
o16-o22
o2-o8
s24
s22
s20
s18
s16
s14
s12
B
1.1
1.0
.9
.8
.7
.6
.5
Cohn, Mehl, Pennebaker. 2004. LinguisJc markers of psychological change surrounding
September 11, 2001. Psychological Science 15, 10: 687-693.
Graph from Pennebaker slides
59. Sentiment Analysis
• Movie Review Mining
– User1: The Matrix rocked, I simply loved it….
– User2: Really, that Keanu Reaves gets on my nerves,
he is too robotic
– User1: it was way deep, it obviously went over your
head!
– User2: I think it GOT INTO ur head J
• What do you think User1 and User2’s
sentiments are toward the movie?
– User1
– User2
• What do you think the sentiment of User2
toward User1 is?
60. Sentiment Analysis
• Movie Review Mining
– User1: The Matrix rocked, I simply loved it….
– User2: Really, that Keanu Reaves gets on my nerves,
he is too robotic
– User1: it was way deep, it obviously went over your
head!
– User2: I think it GOT INTO ur head J
• What do you think User1 and User2’s
sentiments are toward the movie?
– User1
– User2
• What do you think the sentiment of User2
toward User1 is?
61. What about positive restaurant reviews?
Sex, Drugs, and Dessert
• sexy food
• seductively seared fois
gras
— addicted to pepper shooters
— garlic noodles… my drug of
choice
— the fries are like crack
62. Let’s Zoom in
• Beyond Sentiment
– Perspective Identification
• Why would you need that?
– Polling
– Detection of bias
• How would you go about doing it?
– Sentiment analysis
– Opinion mining
– Cognitive state (belief)
63. Dialogue Systems
• User: Can you book me a flight from New York
City to Washington?
• System: Is your destination Washington
state?
• User: No I mean Washington DC
• System: When would you like to fly?
• Example: Amtrak’s Julie system
64. Tell me what's bothering you.
hbp://www.masswerk.at/elizabot/eliza.html
65. Eliza
• Eliza is a mock Rogerian psychotherapist
– Rogerian psychotherapy, or person-‐centered therapy, is a
form of talk-‐psychotherapy developed by psychologist Carl
Rogers in the 1940s & 1950s.
• The original program was described by
Joseph Weizenbaum in 1966.
• Primitive natural language processing using simple
scripts and regular expressions
• Chatterbots or chatbots
• Eliza effect
– the tendency to unconsciously assume computer behaviors
are analogous to human behaviors.
• Turing test
66. • Men are all alike.
• IN WHAT WAY?
• They're always bugging us about something or other.
• CAN YOU THINK OF A SPECIFIC EXAMPLE?
• Well, my boyfriend made me come here.
• YOUR BOYFRIEND MADE YOU COME HERE
• He says I'm depressed much of the time.
• I AM SORRY TO HEAR YOU ARE DEPRESSED
• It's true. I am unhappy
• DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY
• I need some help, that much seems certain.
• WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP
• Perhaps I could learn to get along with my mother.
• TELL ME MORE ABOUT YOUR FAMILY
• My mother takes care of me.
• WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU
• My father.
• YOUR FATHER
• You are like my father in some ways.
Original Chatting Program
Eliza
67. s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE 1/
s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE 1/
s/.* all .*/IN WHAT WAY/
s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/
Eliza-style regular expressions
Step 1: replace first person with second person references
s/bI(’m| am)b /YOU ARE/g
s/bmyb /YOUR/g
S/bmineb /YOURS/g
Step 2: use additional regular expressions to generate replies
Step 3: use scores to rank possible transformations
68. • Let’s chat with Mitsuku!
• http://www.mitsuku.com
• Loebner prize winner 2013,
runner up 2015
– Modern form of the Turing test
for Artificial Intelligence
Mitsuku
Slide courtesy of Nizar Habash
70. Question Answering: IBM’s
Watson
• Won Jeopardy on February 16, 2011!
70
WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Bram Stoker
76. Information Retrieval
• Very successful enterprise: Google, Bing,
Yahoo, Altavista
• General model: given a huge collection of texts
(document collection), given a query
– Task: find specific documents that are relevant to
the given query
– How: Create an index, like the index in a book to
look up the information, predominant approaches
include vector space models
77. Information Extraction
Subject: curriculum meeting
Date: January 15, 2012
To: Dan Jurafsky
Hi Dan,
we’ve now scheduled the curriculum meeting.
It will be in Gates 159 tomorrow from 10:00-11:30.
-Chris Create new Calendar entry
Event: Curriculum mtg
Date: Jan-16-2012
Start: 10:00am
End: 11:30am
Where: Gates 159
78. Information Extraction
• nice and compact to carry!
• since the camera is small and light, I won't
need to carry around those heavy, bulky
professional cameras either!
• the camera feels flimsy, is plastic and very
light in weight you have to be very delicate
in the handling of this camera78
Size and weight
Abributes:
zoom
affordability
size and weight
flash
ease of use
✓
✗
✓
81. Reminder of who I amJ
• Prof in CS department working on issues
of big data, data science, natural
language processing
• mtdiab@gwu.edu
• Check out my research @
– www.seas.gwu.edu/~mtdiab
• NLP lab @gw
– Care4lang1.seas.gwu.edu