SlideShare a Scribd company logo
1
Final Project
Detecting Egotism in Text using
Deep Learning
Rahmatian, Mahyar
@Rahmatian, Mahyar
CSCI E-89 Deep Learning, Spring 2020
Harvard University Extension School
Prof. Zoran B. Djordjević
“The Ego is a veil between humans and God.” Rumi
 What is this ego that we need to identify and transcend?
 Egotism features an inflated opinion of one's personal features and importance
distinguished by a person’s amplified vision of one’s self and self-importance. It is a
destructive force that we can recognize in our text using Deep Learning.
 We mainly will be using Python’s spaCY prebuilt statistical neural network models
to perform tasks on English text. We’ll also be training spaCy’s CNN model with our
own data (egoistic and non-egotistic sentences) to introduce new NERs (Name
Entity Recognition). Other Python NLP libraries used in this project are NTLK, and
Genism.
 We’ll be defining 8 different methods to detect Egotism in text.
 It may be subjective as to what is or is not egotistic, it should be fairly easy to
reflect those changes in our detection methods. See project report for more detail
on our definitions.
@ Rahmatian, Mahyar 2
Pre-possessing
 Cleanup
import preprocess_kgptalkie as ps
def get_clean(x):
x = str(x).lower().replace('', '').replace('_', ' ')
x = ps.remove_emails(x)
x = ps.remove_urls(x)
x = ps.remove_html_tags(x)
x = ps.remove_accented_chars(x)
x = ps.remove_special_chars(x)
x = ps.make_base(x)
x = re.sub("(.)1{2,}", "1", x)
return x
DOCUMENT_cleaned= get_clean(DOCUMENT)
 Summarize
from gensim.summarization import summarize
print(summarize(DOCUMENT, word_count=75, split=False))
@ Rahmatian, Mahyar 3
5 Documents to Examine
 Document: A CNN news item text – as a reference point and we expect this to be a
neutral document
 DOCUMENT_ego_a: A statement from President Trump about President-Elect
Biden. We expect this to be Egoistic!
 DOCUMENT_ego_b: A text segment From Donald Trump’s book, The Art of Deal.
We expect this to be Egoistic!
 DOCUMENT_no_ego_a: A short article from Eckhart Tolle, the most popular
spiritual author in the United States and best-selling author of The Power of Now.
We expect this to be non_Egoistic!
 DOCUMENT_no_ego_b: Another short article from Eckhart Tolle, the most popular
spiritual author in the United States and best-selling author of The Power of Now.
We expect this to be non_Egoistic!
@ Rahmatian, Mahyar 4
Method 1 – entities frequency
 The more entities in a document the more egoistic, use spaCy to find all entities.
 Frequency of top 5 entities
 Average of DOCUMENT_ego 17
 Average of DOCUMENT_no_ego 2.5
@ Rahmatian, Mahyar 5
Method 2 - tense
 Ego likes past and future, and dissolves in present , use NLTK word_tokenize to find
the tense of a document. (word infections) The less present more egotistic.
 present %
 Average of DOCUMENT_ego 69
 Average of DOCUMENT_no_ego 71.5
@ Rahmatian, Mahyar 6
Method 3 - plural
 The less % of plural version of verbs/nouns in use, the more egoistic.
 Plural percent
 Average of DOCUMENT_ego 6.5
 Average of DOCUMENT_no_ego 3.5
@ Rahmatian, Mahyar 7
Method 4 - pronoun
 Use spaCy pronoun detection to find separationist (I, mine, yours) vs inclusive (we,
ours) pronouns. Ego documents show less inclusive.
 Inclusive pronoun percent
 Average of DOCUMENT_ego 3.5
 Average of DOCUMENT_no_ego 17
@ Rahmatian, Mahyar 8
Method 5 - readability
 Ego likes high complexity in readability. Use spacy_readability library to score a
document in 2 different methods, then simplify the average of those methods to
Easy, Hard, and Very Hard readability
 Average of DOCUMENT_ego Hard readability
 Average of DOCUMENT_no_ego Hard readability
@ Rahmatian, Mahyar 9
Method 6 - sentiment
 Ego likes negativity. Use NLTK SentimentIntensityAnalyzer to find the sentiment
 Average of DOCUMENT_ego neutral
 Average of DOCUMENT_no_ego neutral
@ Rahmatian, Mahyar 10
Method 7 - emotion
 Ego likes Angry, Surprise, Sad, Fear, but not Joy. Use text2emotion to detect
emotions, then calculate, score = happy - (Angry + Surprise + Sad + Fear)
from +1 (max happy) to -1 (min happy)
 Emotion score
 Average of DOCUMENT_ego -.85
 Average of DOCUMENT_no_ego -.65
@ Rahmatian, Mahyar 11
Method 8 – training NER
 Training spaCy with sentences to learn two new Egoistic and non-Egoistic entities
(NER).
 For training egoistic entities, we need egoistic words. These words must be used in
two sentences. One sentence with egoistic context and the other in non- Egoistic
or neutral context.
 For example:
“complain” is an egoistic word
 Egoistic sentence is “She had done nothing but cry, complain and faint since
this ordeal had begun”
 Non-Egoistic sentence is “I have nothing to complain about”
@ Rahmatian, Mahyar 12
Method 8 - training NER
 We start with seed words for both Egoistic and Non-Egoistic entities. We then find
synonyms and antonyms words for both sets. And later, we combine them to our
collection of Egoistic and non-Egoistic list of words.
 For example:
 complain  criticize (synonyms), applaud (antonyms)
 gratitude  grateful (synonyms), resentment (antonyms)
 Combined Egoistic list = complain, criticize, resentment
 Combined Non-Egoistic list = gratitude, grateful, applaud
 We can find thousands of words, but here we just select about 20 words from each
category to make a sentence
@ Rahmatian, Mahyar 13
Method 8 - training NER
 Training sentences for Egoistic entity, one word is used in egotistic context and
next line, the same word is used in non-egoistic context
@ Rahmatian, Mahyar 14
Method 8 - training NER
 Use Spacy matcher to help with labeling, {'entities': [(25, 35, 'EGOISTIC')]}) then a
little manual formatting to get the final training text below
@ Rahmatian, Mahyar 15
Method 8 - training NER
 Training
@ Rahmatian, Mahyar 16
Method 8 – EGOISTIC entity
 Finding our new EGOISTIC entity in our documents
 Average number of EGOISTIC entities for DOCUMENT_ego 8.5
 Average number of EGOISTIC entities for DOCUMENT_no_ego 5
@ Rahmatian, Mahyar 17
Method 8 – training NER (non-EGOISTIC)
 Different set of words to write paired-sentences for non-EGOISTIC sentences
@ Rahmatian, Mahyar 18
Method 8 – non-EGOISTIC entity
 Finding our new non-EGOISTIC entity in our documents
 Average number of non_EGOISTIC entities for DOCUMENT_ego 0.5
 Average number of non_EGOISTIC entities for DOCUMENT_no_ego 2.5
@ Rahmatian, Mahyar 19
Final Tally
 Scores from all methods. (< means less number is better, less egotism)
 We see that 6 (in bold) out of 9 indicators correctly differentiated between the two
documents
 There is room to improve each of the indicators for greater differentiation
 It is also possible to run more documents through the 9 indicators and gather
more rows, then feed those rows to a secondary NN.
@ Rahmatian, Mahyar 20
The End
 Associated notebook is a very good training ground for deep learning in NLP.
 It is very time consuming to generate a good set of labeled sentences to feed the
model. With more effort on labeled sentences, it will be easy to detect egotism in
our text more accurately.
 It is possible to feed the result of all indicators to yet another Deep Learning
model and expect higher accuracy
 Future Enactments:
 Voice to text
 Web based
 Individual method Scoring improvements
 Train with more labeled sentences
 Upgrade to spaCy 3.0 and use spacy-transformers, pretrained transformers like
BERT
 Resume Enhancer
@ Rahmatian, Mahyar 21
“The sage battles his own ego, the fool battles
everyone else’s” - Rumi
@ Rahmatian, Mahyar 22
YouTube URLs, Last Page
 Two minute (short): https://youtu.be/9DYvJWaepc8
 15 minutes (long): https://youtu.be/KZqg6KqUyMg
@Your Name 23

More Related Content

What's hot

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
ishan0019
 
columbia-gwu
columbia-gwucolumbia-gwu
columbia-gwu
Tianrui Peng
 
Ml ppt
Ml pptMl ppt
Ml ppt
Alpna Patel
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
Ravi Kumar
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
Rachit Goel
 
Natural language processing in artificial intelligence
Natural language processing in artificial intelligenceNatural language processing in artificial intelligence
Natural language processing in artificial intelligence
Abdul Rafay
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
Iswarya M
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSISTEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
acijjournal
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
Ankush Mehta
 
Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014
Bo Hyun Kim
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Knowledge Media Institute - The Open University
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Aanchal Chaurasia
 

What's hot (14)

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
columbia-gwu
columbia-gwucolumbia-gwu
columbia-gwu
 
Ml ppt
Ml pptMl ppt
Ml ppt
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Natural language processing in artificial intelligence
Natural language processing in artificial intelligenceNatural language processing in artificial intelligence
Natural language processing in artificial intelligence
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSISTEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar to Detecting egotism in text - Mahyar Rahmatian 2020

detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
Safayet Hossain
 
A review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxA review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptx
voicemail1
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
Boston Institute of Analytics
 
Emotion mining in text
Emotion mining in textEmotion mining in text
Emotion mining in text
Lovepreet Singh
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysis
IOSR Journals
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
Kimberly Pulley
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Dr. Amarjeet Singh
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
MD. ABUL KALAM AZAD
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
Rebecca Williams
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
OmSatpathy
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Journal For Research
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
Khan Mostafa
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docx
KevinSims18
 
Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017
International Journal of Advance Research and Innovative Ideas in Education
 
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam LanguageA Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
Jeff Nelson
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
Subhadarsini Prusty
 
O’Brien .docx
O’Brien                                                   .docxO’Brien                                                   .docx
O’Brien .docx
honey690131
 
Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.
Kristine Schachinger SEO and Online Marketing
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWSRULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
ijaia
 

Similar to Detecting egotism in text - Mahyar Rahmatian 2020 (20)

detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
A review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxA review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Emotion mining in text
Emotion mining in textEmotion mining in text
Emotion mining in text
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysis
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docx
 
Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017
 
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam LanguageA Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
 
O’Brien .docx
O’Brien                                                   .docxO’Brien                                                   .docx
O’Brien .docx
 
Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWSRULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
 

Recently uploaded

一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
ArianaRamos54
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 

Recently uploaded (20)

一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
8 things to know before you start to code in 2024
8 things to know before you start to code in 20248 things to know before you start to code in 2024
8 things to know before you start to code in 2024
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 

Detecting egotism in text - Mahyar Rahmatian 2020

  • 1. 1 Final Project Detecting Egotism in Text using Deep Learning Rahmatian, Mahyar @Rahmatian, Mahyar CSCI E-89 Deep Learning, Spring 2020 Harvard University Extension School Prof. Zoran B. Djordjević
  • 2. “The Ego is a veil between humans and God.” Rumi  What is this ego that we need to identify and transcend?  Egotism features an inflated opinion of one's personal features and importance distinguished by a person’s amplified vision of one’s self and self-importance. It is a destructive force that we can recognize in our text using Deep Learning.  We mainly will be using Python’s spaCY prebuilt statistical neural network models to perform tasks on English text. We’ll also be training spaCy’s CNN model with our own data (egoistic and non-egotistic sentences) to introduce new NERs (Name Entity Recognition). Other Python NLP libraries used in this project are NTLK, and Genism.  We’ll be defining 8 different methods to detect Egotism in text.  It may be subjective as to what is or is not egotistic, it should be fairly easy to reflect those changes in our detection methods. See project report for more detail on our definitions. @ Rahmatian, Mahyar 2
  • 3. Pre-possessing  Cleanup import preprocess_kgptalkie as ps def get_clean(x): x = str(x).lower().replace('', '').replace('_', ' ') x = ps.remove_emails(x) x = ps.remove_urls(x) x = ps.remove_html_tags(x) x = ps.remove_accented_chars(x) x = ps.remove_special_chars(x) x = ps.make_base(x) x = re.sub("(.)1{2,}", "1", x) return x DOCUMENT_cleaned= get_clean(DOCUMENT)  Summarize from gensim.summarization import summarize print(summarize(DOCUMENT, word_count=75, split=False)) @ Rahmatian, Mahyar 3
  • 4. 5 Documents to Examine  Document: A CNN news item text – as a reference point and we expect this to be a neutral document  DOCUMENT_ego_a: A statement from President Trump about President-Elect Biden. We expect this to be Egoistic!  DOCUMENT_ego_b: A text segment From Donald Trump’s book, The Art of Deal. We expect this to be Egoistic!  DOCUMENT_no_ego_a: A short article from Eckhart Tolle, the most popular spiritual author in the United States and best-selling author of The Power of Now. We expect this to be non_Egoistic!  DOCUMENT_no_ego_b: Another short article from Eckhart Tolle, the most popular spiritual author in the United States and best-selling author of The Power of Now. We expect this to be non_Egoistic! @ Rahmatian, Mahyar 4
  • 5. Method 1 – entities frequency  The more entities in a document the more egoistic, use spaCy to find all entities.  Frequency of top 5 entities  Average of DOCUMENT_ego 17  Average of DOCUMENT_no_ego 2.5 @ Rahmatian, Mahyar 5
  • 6. Method 2 - tense  Ego likes past and future, and dissolves in present , use NLTK word_tokenize to find the tense of a document. (word infections) The less present more egotistic.  present %  Average of DOCUMENT_ego 69  Average of DOCUMENT_no_ego 71.5 @ Rahmatian, Mahyar 6
  • 7. Method 3 - plural  The less % of plural version of verbs/nouns in use, the more egoistic.  Plural percent  Average of DOCUMENT_ego 6.5  Average of DOCUMENT_no_ego 3.5 @ Rahmatian, Mahyar 7
  • 8. Method 4 - pronoun  Use spaCy pronoun detection to find separationist (I, mine, yours) vs inclusive (we, ours) pronouns. Ego documents show less inclusive.  Inclusive pronoun percent  Average of DOCUMENT_ego 3.5  Average of DOCUMENT_no_ego 17 @ Rahmatian, Mahyar 8
  • 9. Method 5 - readability  Ego likes high complexity in readability. Use spacy_readability library to score a document in 2 different methods, then simplify the average of those methods to Easy, Hard, and Very Hard readability  Average of DOCUMENT_ego Hard readability  Average of DOCUMENT_no_ego Hard readability @ Rahmatian, Mahyar 9
  • 10. Method 6 - sentiment  Ego likes negativity. Use NLTK SentimentIntensityAnalyzer to find the sentiment  Average of DOCUMENT_ego neutral  Average of DOCUMENT_no_ego neutral @ Rahmatian, Mahyar 10
  • 11. Method 7 - emotion  Ego likes Angry, Surprise, Sad, Fear, but not Joy. Use text2emotion to detect emotions, then calculate, score = happy - (Angry + Surprise + Sad + Fear) from +1 (max happy) to -1 (min happy)  Emotion score  Average of DOCUMENT_ego -.85  Average of DOCUMENT_no_ego -.65 @ Rahmatian, Mahyar 11
  • 12. Method 8 – training NER  Training spaCy with sentences to learn two new Egoistic and non-Egoistic entities (NER).  For training egoistic entities, we need egoistic words. These words must be used in two sentences. One sentence with egoistic context and the other in non- Egoistic or neutral context.  For example: “complain” is an egoistic word  Egoistic sentence is “She had done nothing but cry, complain and faint since this ordeal had begun”  Non-Egoistic sentence is “I have nothing to complain about” @ Rahmatian, Mahyar 12
  • 13. Method 8 - training NER  We start with seed words for both Egoistic and Non-Egoistic entities. We then find synonyms and antonyms words for both sets. And later, we combine them to our collection of Egoistic and non-Egoistic list of words.  For example:  complain  criticize (synonyms), applaud (antonyms)  gratitude  grateful (synonyms), resentment (antonyms)  Combined Egoistic list = complain, criticize, resentment  Combined Non-Egoistic list = gratitude, grateful, applaud  We can find thousands of words, but here we just select about 20 words from each category to make a sentence @ Rahmatian, Mahyar 13
  • 14. Method 8 - training NER  Training sentences for Egoistic entity, one word is used in egotistic context and next line, the same word is used in non-egoistic context @ Rahmatian, Mahyar 14
  • 15. Method 8 - training NER  Use Spacy matcher to help with labeling, {'entities': [(25, 35, 'EGOISTIC')]}) then a little manual formatting to get the final training text below @ Rahmatian, Mahyar 15
  • 16. Method 8 - training NER  Training @ Rahmatian, Mahyar 16
  • 17. Method 8 – EGOISTIC entity  Finding our new EGOISTIC entity in our documents  Average number of EGOISTIC entities for DOCUMENT_ego 8.5  Average number of EGOISTIC entities for DOCUMENT_no_ego 5 @ Rahmatian, Mahyar 17
  • 18. Method 8 – training NER (non-EGOISTIC)  Different set of words to write paired-sentences for non-EGOISTIC sentences @ Rahmatian, Mahyar 18
  • 19. Method 8 – non-EGOISTIC entity  Finding our new non-EGOISTIC entity in our documents  Average number of non_EGOISTIC entities for DOCUMENT_ego 0.5  Average number of non_EGOISTIC entities for DOCUMENT_no_ego 2.5 @ Rahmatian, Mahyar 19
  • 20. Final Tally  Scores from all methods. (< means less number is better, less egotism)  We see that 6 (in bold) out of 9 indicators correctly differentiated between the two documents  There is room to improve each of the indicators for greater differentiation  It is also possible to run more documents through the 9 indicators and gather more rows, then feed those rows to a secondary NN. @ Rahmatian, Mahyar 20
  • 21. The End  Associated notebook is a very good training ground for deep learning in NLP.  It is very time consuming to generate a good set of labeled sentences to feed the model. With more effort on labeled sentences, it will be easy to detect egotism in our text more accurately.  It is possible to feed the result of all indicators to yet another Deep Learning model and expect higher accuracy  Future Enactments:  Voice to text  Web based  Individual method Scoring improvements  Train with more labeled sentences  Upgrade to spaCy 3.0 and use spacy-transformers, pretrained transformers like BERT  Resume Enhancer @ Rahmatian, Mahyar 21
  • 22. “The sage battles his own ego, the fool battles everyone else’s” - Rumi @ Rahmatian, Mahyar 22
  • 23. YouTube URLs, Last Page  Two minute (short): https://youtu.be/9DYvJWaepc8  15 minutes (long): https://youtu.be/KZqg6KqUyMg @Your Name 23