SlideShare a Scribd company logo
Natural Language Processing
Literature Survey
Overview of Computerized Deception Detection
Yoav Francis, IDC Herzelia

11/06/2013

1
Part 1 - Topics of Interest
1. Sentiment Analysis
2. Sign Language (capture and recognition)
3. Computational Creative Naming
4. Computerized Deception Detection
5. NLP Approaches for Multiword Expressions
6. Answer Extraction
7. Natural Language Generation
8. Automatic Text Summarization
9. NLP-based Bibliometrics
10. Natural Language User Interfaces for Relational Databases
11. Truecasing - Restoring Case Information for badly/non-cased text
12. The Web as a Corpus

2
Part 2 - Extension on 4 Selected Topics
1 - Sign Language (capture and recognition)
The American Sign language is the primary means of communication for around 1.5 million
deaf people in the United States [2].

It is a visual-gestural language using upper body

gestures. There is no written form of sign language - currently corpora take the form of
videos [13]- and the NLP eld may need to adapt for this research eld. There were a few
attempts to create a sign language corpora ([13]) , but they have yet to be learned from a
linguistic / NLP perspective. Tools and adaptation of existing tools need to be developed
in order to face this challenge - in regards to timing, spatial reference, inection and new
methods of unied motion capture for use in sign language analysis.

2 - Truecasing
Truecasing is the problem of determining proper capitalization for a sentence/document
when it is uncapitlized / wrongly capitalized. This is mainly for use in English and any
language whose script includes a distinction between lower and upper case letters.

The

problem is irrelevant for languages that are not written in Latin, Cyrillic, Greek or Armenian alphabet. Truecasing is an aid for many tasks (besides readability, of course) such as
entity recognition, translation and content extraction. The process main aim is to restore
case information to raw text. ([11, 14])

3 -Natural Language User Interfaces for Relational Databases
A Natural language interface for a database allows the user to type in natural language
queries (such as : what buses leave on 16:00 from Tel-Aviv?) - and are the transformed
to an SQL query. This translation phase poses as an NLP challenge in some regards - it
requires a morphological and syntactic analysis, followed by a semantic analysis in order to
transform the user's question input to a few intermediate-language representations - that
correlate to the possible options for the user's question, before choosing the one that will be
transformed to an SQL query. This architecture is formally known as Natural Language
Interface for Databases (NLIDB). A popular implementation of such an NLIDB is called
Edite and is widely available. ([10, 15])

4 - Computerized Deception Detection
Computerized detection of deception is the process of detecting authenticity and truthfulness in a given text (for example, someone writing false reviews). Methods for doing
so can be simply lexical (in a sense that they simply use dictionary word count), or using
POS tagging and n-grams for higher rate of success. Some previous insights include, for
example, that deceivers use verbs and pronouns more often.

More complex approaches

to yield better detection rate include referring to the syntactic stylometry of the text, by
using CFG trees. Uses for this detection can be implemented for detecting fake reviews
(Opinion Spam) ([4, 16, 17])

3
Part 3 - 2-Page Survey - Computerized Deception Detection
Deception detection, or Deceptive opinion detection, is the task of inferring and deciding
whether a given text, that carries some opinion is deceptive (or false). To further clear
what this means, take, for instance, an hotel review site - an adversary may post a review
that was deliberately written to sound authentic and to deceive the reader that this review
is indeed truthful. The `deception` we will be referring to in this summary will be of user
reviews / opinions..
The task at hand therefore is, given some text (or review), to decide whether the review
is truthful. The need for this is rather clear - preventing deceptive opinion spam [17] in
mediums where reviews or opinions are written or posted. Nowdays, where crowdsourcing
platforms such Amazon Mechanical Turk exists, deceiving opinions can easily be generated
and can bias a user for the better (or for worst).
The task poses as quite a challenge - since we do want to reach as few false-positives
as possible, and the task itself involves many aspects from the eld of natural language
processing.
As with many other natural-language-based tasks, this tasks also requires some data - for
example, from some review websites (in [4], for example, data from tripadvisor was taken).
We need some `reviews` that are guaranteed to be truthful and some that are guaranteed
to be deceptive - that is , that can be used as a gold data-set that we can compare our
evaluation against. It is worth noting that even without an applicable gold dataset, there
exists an heuristic approach for evaluation ([19]).
In turning to evaluate deceiving reviews - we shall regard the case that such a gold-set exists
(in [17], the gold-sets for deceiving reviews were generated by using Amazon Mechanical
Turk).

As for the 'truthful' part of the gold set, that is, truthful reviews - that can be

collected from authenticated and well-reputated users (that was also done in [17]). Such
datasets, that can be domain-specic, are publicly available ([20])
Before attempting to do a machine based evaluation, it is interesting to inspect the performance of human evaluation. In [17] it is summarized that humans judgement/detection of
deceit is poor, and according to their test a maximum average accuracy of 61% of correctly
telling truth from deceit - concluding that the correlation between same/dierent decisions
by dierent people regarding a given review is almost at-chance.
As for an automated, NLP-approch for the issue - There exist several approaches :
One approach is based on analyzing the frequency of POS tagging as a comparison basis
for deciding whether a given text is deceitful or truthful. In the analysis of this method in
[17] it was shown to have the lowest accuracy from all machine-based methods.
A second approach is based on psycholinguistics in order to be able to detect personality
traits. such tool widely exists (LIWC , [21]) . It is basically a bit more socially-oriented
approach to the previous POS tagging mechanism. Analysis of this method yielded a bit
better results than the POS-based one.
A third approach introduces n-grams to the model, and categorization of the text. Using
this type of classication dramatically increased the success of detection and yielded an
accuracy of ~88%. This signies the fact that the context of words in the sentence (that
is, n-grams based detection) is a major contributor when detecting deceiving opinions.

4
Finally, a lately published article [4] suggested an even more novel approach - taking into
account the `syntactic stylometry` (that is, evaluating the similarity of dierent opinions
based on the 'style of writing'.

According to [22], Similar work in regards to syntactic

stylometry has been made in regards to authorship attribution and even age attribution
for blogs [23].
This more novel method can be achieved with techniques based on Probabilistic Context
Free Grammar (PCFG) parse trees - as this is the most prominent technique for analysis
of syntactic stylometry[17, 22, 23].

Previously mentioned methods are based only on

shallow lexico-syntactic features. In [4], analysis of this method yielded very high statistical
evidence of deep syntactic patterns that allow us to detect deceitful texts with very high
accuracy (91.2%)
It is also worth noting that in all machine-models suggested above, the precision and recall
parameters were very close to each other, as can be seen in the comparison table in [17].
Further research has also been made in regards to duplicate opinion detection (in a sense
that the same writer wrote duplicate reviews, but wrote each in a `dierent way`), and
specic deception detection techniques that can be model-specic ([18])
As a quick test to the reader and to signify the (lack-of ) human evaluation skills of deceit have a look at gure 1 and see if you can tell which review is truthful and which is deceitful
(this was taken from[17]).

Figure 1: Truthful and Deceitful Reviews/Opinions

1. I have stayed at many hotels traveling for both business and pleasure and I can honestly stay that The
James is tops. The service at the hotel is rst class. The rooms are modern and very comfortable. The
location is perfect within walking distance to all of the great sights and restaurants. Highly recommend
to both business travelers and couples.
2. My husband and I stayed at the James Chicago Hotel for our anniversary. This place is fantastic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the
sta very attentive and wonderful!! The area of the hotel is great, since I love to shop I couldn't ask for
more!! We will denatly be back to Chicago and we will for sure be back to the James Chicago.
Future work obviously includes adapting the above methods to other problem domains,
for example, reviews of other kinds, or any platform where user feedback and opinion is
possible. Deception is a rather prevalent phenomenon ([24]) - in many mediums where users
can express their opinions. Another interesting direction would be to analyze deception and
truthfulness on combined data from many dierent data sets (for example, hotel reviews,
movie reviews, products, etc.) and seeing whether we can come up with a valid deception
criteria for some text from the aforementioned domains, and not from a specic domain
based on that domain training.
Personally and to conclude - I found the deception detection topic and its regards to NLP
quite fascinating, and very much enjoyed reading the relevant papers on the subject. It
seems like we are `almost-there` on creating and streamlining a product that will be able
to detect deceiving opinions on the web (or anywhere else)

5
Part 4 - References
[1] Becky Sue Parton, Sign Language Recognition and Translation:

A Multidisciplined

Approach From the Field of Articial Intelligence, Journal of Deaf Studies, 2011
[2] Lu and Huenerfauth, Collecting a Motion-Capture Corpus of American Sign Language
for Data-Driven Generation Research, NAACL HLT, 2010
[3] Ozbal and Strapparava, A Computational Approach to the Automation of Creative
Naming, ACL 2012
[4] Feng, Banerjee and Choi, Syntactic Stylometry for Deception Detection, ACL 2012
[5] Sag, Baldwin et al. , Multiword Expressions: A Pain in the Neck for NLP, Stanford
University LinGO Project, 2001
[6] Abney, Collins and Singhal, Answer Extraction, ATT Shannon Labs, ANLC 2000
[7] Reiter and Dale, Building Natural Language Generation Systems, Cambridge Press,
2000
[8] Hahn and Reimer, Advances in automatic text summarization, MIT Press, 1999
[9] Abu-Jbara, Ezra and Radev, Purpose and Polarity of Citation: Towards NLP-based
Bibliometrics, NAACL-HLT 2013
[10] Filipe and Mamede, Databases and Natural Language Interfaces, CSTC Portugal,
2007
[11] Lita, Roukos et al., tRuEcasIng, ACL 2003
[12] Kilgarri and Grefenstette, Introduction to the Special Issue on the Web as Corpus,
ACL 2003
[13] Segouat and Braort, Toward Categorization of Sign Language Corpora, AFNLP 2009
[14] English Wikipedia, Truecasing
[15] Stratica, Kosseim and Desai, NLIDB Templates for Semantic Parsing, Concordia University, Canada
[16] Argamon, Koppel and Avneri, Style-based Text Categorization: What Newspaper Am
I Reading?, AAAI 1998
[17] Ott et al., Finding Deceptive Opinion Spam by Any Stretch of the Imagination, ACL
2011
[18] Jindal and Liu, Opinion Spam and Analysis, WSDM 2008
[19] Wu et.

al, Distortion as a Validation Criterion in the Identication of Suspicious

Reviews, SOMA 2010
[20] TripAdvisor Ireland Dataset, http://mlg.ucd.ie/datasets/trip
[21] Linguistic Inquiry and Word Count (LIWC) - http://www.liwc.net/
[22] Hollingsworth, Syntactic Stylometry: Using Sentence Structure for Authorship Attribution, University of Georgia, 2012
[23] Jaget Sastry, Blogger Age Attribution Using Syntactic Stylometry, https://bitbucket.org/jagatsastry/
[24] Ott, Cardie and Hancock, Estimating the prevalence of deception in online review
communities, WWW 2012

6

More Related Content

Similar to NLP Literature Survey with focus on Computerized Deception Detection

A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
IJECEIAES
 
Development of a Novel Conversational Calculator Based on Remote Online Compu...
Development of a Novel Conversational Calculator Based on Remote Online Compu...Development of a Novel Conversational Calculator Based on Remote Online Compu...
Development of a Novel Conversational Calculator Based on Remote Online Compu...toukaigi
 
Roman urdu opinion mining system
Roman urdu opinion mining systemRoman urdu opinion mining system
Roman urdu opinion mining system
cseij
 
A framework for plagiarism
A framework for plagiarismA framework for plagiarism
A framework for plagiarism
csandit
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
Kimberly Pulley
 
Fake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine LearningFake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine Learning
IRJET Journal
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation Generation
Kayla Jones
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
csandit
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
cscpconf
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
Brightfind
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
Mido Razaz
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets
gerogepatton
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
gerogepatton
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
ijaia
 
Ijmet 10 01_094
Ijmet 10 01_094Ijmet 10 01_094
Ijmet 10 01_094
IAEME Publication
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write5
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write4
 
Anu2018
Anu2018Anu2018

Similar to NLP Literature Survey with focus on Computerized Deception Detection (20)

A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
 
Development of a Novel Conversational Calculator Based on Remote Online Compu...
Development of a Novel Conversational Calculator Based on Remote Online Compu...Development of a Novel Conversational Calculator Based on Remote Online Compu...
Development of a Novel Conversational Calculator Based on Remote Online Compu...
 
Roman urdu opinion mining system
Roman urdu opinion mining systemRoman urdu opinion mining system
Roman urdu opinion mining system
 
A framework for plagiarism
A framework for plagiarismA framework for plagiarism
A framework for plagiarism
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Fake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine LearningFake Reviews Detection using Supervised Machine Learning
Fake Reviews Detection using Supervised Machine Learning
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation Generation
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews Using NLP Approach for Analyzing Customer Reviews
Using NLP Approach for Analyzing Customer Reviews
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
8
88
8
 
Brightfind world usability day 2016 full deck final
Brightfind world usability day 2016   full deck finalBrightfind world usability day 2016   full deck final
Brightfind world usability day 2016 full deck final
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
 
Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets Customer Opinions Evaluation: A Case Study on Arabic Tweets
Customer Opinions Evaluation: A Case Study on Arabic Tweets
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETSCUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
CUSTOMER OPINIONS EVALUATION: A CASESTUDY ON ARABIC TWEETS
 
Ijmet 10 01_094
Ijmet 10 01_094Ijmet 10 01_094
Ijmet 10 01_094
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Anu2018
Anu2018Anu2018
Anu2018
 

More from Yoav Francis

Marxism in the internet age and social networks
Marxism in the internet age and social networksMarxism in the internet age and social networks
Marxism in the internet age and social networks
Yoav Francis
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)
Yoav Francis
 
States of Mind: can they be communicated and compared?
States of Mind: can they be communicated and compared?States of Mind: can they be communicated and compared?
States of Mind: can they be communicated and compared?
Yoav Francis
 
Carnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of ActionCarnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of Action
Yoav Francis
 
From Hierarchical to a One-Level view of Consciousness: Overview and Comparison
From Hierarchical to a One-Level view of Consciousness: Overview and ComparisonFrom Hierarchical to a One-Level view of Consciousness: Overview and Comparison
From Hierarchical to a One-Level view of Consciousness: Overview and Comparison
Yoav Francis
 
Theories of Consciousness - Overview and Discussion
Theories of Consciousness - Overview and DiscussionTheories of Consciousness - Overview and Discussion
Theories of Consciousness - Overview and Discussion
Yoav Francis
 
McTaggart's Argument on the Unreality of Time - Overview and Discussion
McTaggart's Argument on the Unreality of Time - Overview and DiscussionMcTaggart's Argument on the Unreality of Time - Overview and Discussion
McTaggart's Argument on the Unreality of Time - Overview and Discussion
Yoav Francis
 
Epicurean Physics: on the Existence of Minimal Units
Epicurean Physics: on the Existence of Minimal UnitsEpicurean Physics: on the Existence of Minimal Units
Epicurean Physics: on the Existence of Minimal Units
Yoav Francis
 
Isaiah Berlin: Positive and Negative Freedom
Isaiah Berlin: Positive and Negative FreedomIsaiah Berlin: Positive and Negative Freedom
Isaiah Berlin: Positive and Negative Freedom
Yoav Francis
 
"A Single Man": Choosing Life in a Nietzschean Context
"A Single Man": Choosing Life in a Nietzschean Context"A Single Man": Choosing Life in a Nietzschean Context
"A Single Man": Choosing Life in a Nietzschean Context
Yoav Francis
 
General Solution for Josephus Problem
General Solution for Josephus ProblemGeneral Solution for Josephus Problem
General Solution for Josephus Problem
Yoav Francis
 
Durkheim, Weber and Comte: Comparative Analysis and Analysis
Durkheim, Weber and Comte: Comparative Analysis and AnalysisDurkheim, Weber and Comte: Comparative Analysis and Analysis
Durkheim, Weber and Comte: Comparative Analysis and Analysis
Yoav Francis
 
Wii Sensor Bar Positioning in 3D Space
Wii Sensor Bar Positioning in 3D SpaceWii Sensor Bar Positioning in 3D Space
Wii Sensor Bar Positioning in 3D Space
Yoav Francis
 
Fisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol OverviewFisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol Overview
Yoav Francis
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
Yoav Francis
 
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy ServerCloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
Yoav Francis
 
Floodlight OpenFlow DDoS
Floodlight OpenFlow DDoSFloodlight OpenFlow DDoS
Floodlight OpenFlow DDoS
Yoav Francis
 

More from Yoav Francis (17)

Marxism in the internet age and social networks
Marxism in the internet age and social networksMarxism in the internet age and social networks
Marxism in the internet age and social networks
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)
 
States of Mind: can they be communicated and compared?
States of Mind: can they be communicated and compared?States of Mind: can they be communicated and compared?
States of Mind: can they be communicated and compared?
 
Carnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of ActionCarnivores: Inspection under Philosophy of Action
Carnivores: Inspection under Philosophy of Action
 
From Hierarchical to a One-Level view of Consciousness: Overview and Comparison
From Hierarchical to a One-Level view of Consciousness: Overview and ComparisonFrom Hierarchical to a One-Level view of Consciousness: Overview and Comparison
From Hierarchical to a One-Level view of Consciousness: Overview and Comparison
 
Theories of Consciousness - Overview and Discussion
Theories of Consciousness - Overview and DiscussionTheories of Consciousness - Overview and Discussion
Theories of Consciousness - Overview and Discussion
 
McTaggart's Argument on the Unreality of Time - Overview and Discussion
McTaggart's Argument on the Unreality of Time - Overview and DiscussionMcTaggart's Argument on the Unreality of Time - Overview and Discussion
McTaggart's Argument on the Unreality of Time - Overview and Discussion
 
Epicurean Physics: on the Existence of Minimal Units
Epicurean Physics: on the Existence of Minimal UnitsEpicurean Physics: on the Existence of Minimal Units
Epicurean Physics: on the Existence of Minimal Units
 
Isaiah Berlin: Positive and Negative Freedom
Isaiah Berlin: Positive and Negative FreedomIsaiah Berlin: Positive and Negative Freedom
Isaiah Berlin: Positive and Negative Freedom
 
"A Single Man": Choosing Life in a Nietzschean Context
"A Single Man": Choosing Life in a Nietzschean Context"A Single Man": Choosing Life in a Nietzschean Context
"A Single Man": Choosing Life in a Nietzschean Context
 
General Solution for Josephus Problem
General Solution for Josephus ProblemGeneral Solution for Josephus Problem
General Solution for Josephus Problem
 
Durkheim, Weber and Comte: Comparative Analysis and Analysis
Durkheim, Weber and Comte: Comparative Analysis and AnalysisDurkheim, Weber and Comte: Comparative Analysis and Analysis
Durkheim, Weber and Comte: Comparative Analysis and Analysis
 
Wii Sensor Bar Positioning in 3D Space
Wii Sensor Bar Positioning in 3D SpaceWii Sensor Bar Positioning in 3D Space
Wii Sensor Bar Positioning in 3D Space
 
Fisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol OverviewFisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol Overview
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy ServerCloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
Cloud Caching Proxy+ - Scalable Cloud-Based Proxy Server
 
Floodlight OpenFlow DDoS
Floodlight OpenFlow DDoSFloodlight OpenFlow DDoS
Floodlight OpenFlow DDoS
 

Recently uploaded

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 

NLP Literature Survey with focus on Computerized Deception Detection

  • 1. Natural Language Processing Literature Survey Overview of Computerized Deception Detection Yoav Francis, IDC Herzelia 11/06/2013 1
  • 2. Part 1 - Topics of Interest 1. Sentiment Analysis 2. Sign Language (capture and recognition) 3. Computational Creative Naming 4. Computerized Deception Detection 5. NLP Approaches for Multiword Expressions 6. Answer Extraction 7. Natural Language Generation 8. Automatic Text Summarization 9. NLP-based Bibliometrics 10. Natural Language User Interfaces for Relational Databases 11. Truecasing - Restoring Case Information for badly/non-cased text 12. The Web as a Corpus 2
  • 3. Part 2 - Extension on 4 Selected Topics 1 - Sign Language (capture and recognition) The American Sign language is the primary means of communication for around 1.5 million deaf people in the United States [2]. It is a visual-gestural language using upper body gestures. There is no written form of sign language - currently corpora take the form of videos [13]- and the NLP eld may need to adapt for this research eld. There were a few attempts to create a sign language corpora ([13]) , but they have yet to be learned from a linguistic / NLP perspective. Tools and adaptation of existing tools need to be developed in order to face this challenge - in regards to timing, spatial reference, inection and new methods of unied motion capture for use in sign language analysis. 2 - Truecasing Truecasing is the problem of determining proper capitalization for a sentence/document when it is uncapitlized / wrongly capitalized. This is mainly for use in English and any language whose script includes a distinction between lower and upper case letters. The problem is irrelevant for languages that are not written in Latin, Cyrillic, Greek or Armenian alphabet. Truecasing is an aid for many tasks (besides readability, of course) such as entity recognition, translation and content extraction. The process main aim is to restore case information to raw text. ([11, 14]) 3 -Natural Language User Interfaces for Relational Databases A Natural language interface for a database allows the user to type in natural language queries (such as : what buses leave on 16:00 from Tel-Aviv?) - and are the transformed to an SQL query. This translation phase poses as an NLP challenge in some regards - it requires a morphological and syntactic analysis, followed by a semantic analysis in order to transform the user's question input to a few intermediate-language representations - that correlate to the possible options for the user's question, before choosing the one that will be transformed to an SQL query. This architecture is formally known as Natural Language Interface for Databases (NLIDB). A popular implementation of such an NLIDB is called Edite and is widely available. ([10, 15]) 4 - Computerized Deception Detection Computerized detection of deception is the process of detecting authenticity and truthfulness in a given text (for example, someone writing false reviews). Methods for doing so can be simply lexical (in a sense that they simply use dictionary word count), or using POS tagging and n-grams for higher rate of success. Some previous insights include, for example, that deceivers use verbs and pronouns more often. More complex approaches to yield better detection rate include referring to the syntactic stylometry of the text, by using CFG trees. Uses for this detection can be implemented for detecting fake reviews (Opinion Spam) ([4, 16, 17]) 3
  • 4. Part 3 - 2-Page Survey - Computerized Deception Detection Deception detection, or Deceptive opinion detection, is the task of inferring and deciding whether a given text, that carries some opinion is deceptive (or false). To further clear what this means, take, for instance, an hotel review site - an adversary may post a review that was deliberately written to sound authentic and to deceive the reader that this review is indeed truthful. The `deception` we will be referring to in this summary will be of user reviews / opinions.. The task at hand therefore is, given some text (or review), to decide whether the review is truthful. The need for this is rather clear - preventing deceptive opinion spam [17] in mediums where reviews or opinions are written or posted. Nowdays, where crowdsourcing platforms such Amazon Mechanical Turk exists, deceiving opinions can easily be generated and can bias a user for the better (or for worst). The task poses as quite a challenge - since we do want to reach as few false-positives as possible, and the task itself involves many aspects from the eld of natural language processing. As with many other natural-language-based tasks, this tasks also requires some data - for example, from some review websites (in [4], for example, data from tripadvisor was taken). We need some `reviews` that are guaranteed to be truthful and some that are guaranteed to be deceptive - that is , that can be used as a gold data-set that we can compare our evaluation against. It is worth noting that even without an applicable gold dataset, there exists an heuristic approach for evaluation ([19]). In turning to evaluate deceiving reviews - we shall regard the case that such a gold-set exists (in [17], the gold-sets for deceiving reviews were generated by using Amazon Mechanical Turk). As for the 'truthful' part of the gold set, that is, truthful reviews - that can be collected from authenticated and well-reputated users (that was also done in [17]). Such datasets, that can be domain-specic, are publicly available ([20]) Before attempting to do a machine based evaluation, it is interesting to inspect the performance of human evaluation. In [17] it is summarized that humans judgement/detection of deceit is poor, and according to their test a maximum average accuracy of 61% of correctly telling truth from deceit - concluding that the correlation between same/dierent decisions by dierent people regarding a given review is almost at-chance. As for an automated, NLP-approch for the issue - There exist several approaches : One approach is based on analyzing the frequency of POS tagging as a comparison basis for deciding whether a given text is deceitful or truthful. In the analysis of this method in [17] it was shown to have the lowest accuracy from all machine-based methods. A second approach is based on psycholinguistics in order to be able to detect personality traits. such tool widely exists (LIWC , [21]) . It is basically a bit more socially-oriented approach to the previous POS tagging mechanism. Analysis of this method yielded a bit better results than the POS-based one. A third approach introduces n-grams to the model, and categorization of the text. Using this type of classication dramatically increased the success of detection and yielded an accuracy of ~88%. This signies the fact that the context of words in the sentence (that is, n-grams based detection) is a major contributor when detecting deceiving opinions. 4
  • 5. Finally, a lately published article [4] suggested an even more novel approach - taking into account the `syntactic stylometry` (that is, evaluating the similarity of dierent opinions based on the 'style of writing'. According to [22], Similar work in regards to syntactic stylometry has been made in regards to authorship attribution and even age attribution for blogs [23]. This more novel method can be achieved with techniques based on Probabilistic Context Free Grammar (PCFG) parse trees - as this is the most prominent technique for analysis of syntactic stylometry[17, 22, 23]. Previously mentioned methods are based only on shallow lexico-syntactic features. In [4], analysis of this method yielded very high statistical evidence of deep syntactic patterns that allow us to detect deceitful texts with very high accuracy (91.2%) It is also worth noting that in all machine-models suggested above, the precision and recall parameters were very close to each other, as can be seen in the comparison table in [17]. Further research has also been made in regards to duplicate opinion detection (in a sense that the same writer wrote duplicate reviews, but wrote each in a `dierent way`), and specic deception detection techniques that can be model-specic ([18]) As a quick test to the reader and to signify the (lack-of ) human evaluation skills of deceit have a look at gure 1 and see if you can tell which review is truthful and which is deceitful (this was taken from[17]). Figure 1: Truthful and Deceitful Reviews/Opinions 1. I have stayed at many hotels traveling for both business and pleasure and I can honestly stay that The James is tops. The service at the hotel is rst class. The rooms are modern and very comfortable. The location is perfect within walking distance to all of the great sights and restaurants. Highly recommend to both business travelers and couples. 2. My husband and I stayed at the James Chicago Hotel for our anniversary. This place is fantastic! We knew as soon as we arrived we made the right choice! The rooms are BEAUTIFUL and the sta very attentive and wonderful!! The area of the hotel is great, since I love to shop I couldn't ask for more!! We will denatly be back to Chicago and we will for sure be back to the James Chicago. Future work obviously includes adapting the above methods to other problem domains, for example, reviews of other kinds, or any platform where user feedback and opinion is possible. Deception is a rather prevalent phenomenon ([24]) - in many mediums where users can express their opinions. Another interesting direction would be to analyze deception and truthfulness on combined data from many dierent data sets (for example, hotel reviews, movie reviews, products, etc.) and seeing whether we can come up with a valid deception criteria for some text from the aforementioned domains, and not from a specic domain based on that domain training. Personally and to conclude - I found the deception detection topic and its regards to NLP quite fascinating, and very much enjoyed reading the relevant papers on the subject. It seems like we are `almost-there` on creating and streamlining a product that will be able to detect deceiving opinions on the web (or anywhere else) 5
  • 6. Part 4 - References [1] Becky Sue Parton, Sign Language Recognition and Translation: A Multidisciplined Approach From the Field of Articial Intelligence, Journal of Deaf Studies, 2011 [2] Lu and Huenerfauth, Collecting a Motion-Capture Corpus of American Sign Language for Data-Driven Generation Research, NAACL HLT, 2010 [3] Ozbal and Strapparava, A Computational Approach to the Automation of Creative Naming, ACL 2012 [4] Feng, Banerjee and Choi, Syntactic Stylometry for Deception Detection, ACL 2012 [5] Sag, Baldwin et al. , Multiword Expressions: A Pain in the Neck for NLP, Stanford University LinGO Project, 2001 [6] Abney, Collins and Singhal, Answer Extraction, ATT Shannon Labs, ANLC 2000 [7] Reiter and Dale, Building Natural Language Generation Systems, Cambridge Press, 2000 [8] Hahn and Reimer, Advances in automatic text summarization, MIT Press, 1999 [9] Abu-Jbara, Ezra and Radev, Purpose and Polarity of Citation: Towards NLP-based Bibliometrics, NAACL-HLT 2013 [10] Filipe and Mamede, Databases and Natural Language Interfaces, CSTC Portugal, 2007 [11] Lita, Roukos et al., tRuEcasIng, ACL 2003 [12] Kilgarri and Grefenstette, Introduction to the Special Issue on the Web as Corpus, ACL 2003 [13] Segouat and Braort, Toward Categorization of Sign Language Corpora, AFNLP 2009 [14] English Wikipedia, Truecasing [15] Stratica, Kosseim and Desai, NLIDB Templates for Semantic Parsing, Concordia University, Canada [16] Argamon, Koppel and Avneri, Style-based Text Categorization: What Newspaper Am I Reading?, AAAI 1998 [17] Ott et al., Finding Deceptive Opinion Spam by Any Stretch of the Imagination, ACL 2011 [18] Jindal and Liu, Opinion Spam and Analysis, WSDM 2008 [19] Wu et. al, Distortion as a Validation Criterion in the Identication of Suspicious Reviews, SOMA 2010 [20] TripAdvisor Ireland Dataset, http://mlg.ucd.ie/datasets/trip [21] Linguistic Inquiry and Word Count (LIWC) - http://www.liwc.net/ [22] Hollingsworth, Syntactic Stylometry: Using Sentence Structure for Authorship Attribution, University of Georgia, 2012 [23] Jaget Sastry, Blogger Age Attribution Using Syntactic Stylometry, https://bitbucket.org/jagatsastry/ [24] Ott, Cardie and Hancock, Estimating the prevalence of deception in online review communities, WWW 2012 6