SlideShare a Scribd company logo
1 of 23
1
Final Project
Detecting Egotism in Text using
Deep Learning
Rahmatian, Mahyar
@Rahmatian, Mahyar
CSCI E-89 Deep Learning, Spring 2020
Harvard University Extension School
Prof. Zoran B. Djordjević
“The Ego is a veil between humans and God.” Rumi
 What is this ego that we need to identify and transcend?
 Egotism features an inflated opinion of one's personal features and importance
distinguished by a person’s amplified vision of one’s self and self-importance. It is a
destructive force that we can recognize in our text using Deep Learning.
 We mainly will be using Python’s spaCY prebuilt statistical neural network models
to perform tasks on English text. We’ll also be training spaCy’s CNN model with our
own data (egoistic and non-egotistic sentences) to introduce new NERs (Name
Entity Recognition). Other Python NLP libraries used in this project are NTLK, and
Genism.
 We’ll be defining 8 different methods to detect Egotism in text.
 It may be subjective as to what is or is not egotistic, it should be fairly easy to
reflect those changes in our detection methods. See project report for more detail
on our definitions.
@ Rahmatian, Mahyar 2
Pre-possessing
 Cleanup
import preprocess_kgptalkie as ps
def get_clean(x):
x = str(x).lower().replace('', '').replace('_', ' ')
x = ps.remove_emails(x)
x = ps.remove_urls(x)
x = ps.remove_html_tags(x)
x = ps.remove_accented_chars(x)
x = ps.remove_special_chars(x)
x = ps.make_base(x)
x = re.sub("(.)1{2,}", "1", x)
return x
DOCUMENT_cleaned= get_clean(DOCUMENT)
 Summarize
from gensim.summarization import summarize
print(summarize(DOCUMENT, word_count=75, split=False))
@ Rahmatian, Mahyar 3
5 Documents to Examine
 Document: A CNN news item text – as a reference point and we expect this to be a
neutral document
 DOCUMENT_ego_a: A statement from President Trump about President-Elect
Biden. We expect this to be Egoistic!
 DOCUMENT_ego_b: A text segment From Donald Trump’s book, The Art of Deal.
We expect this to be Egoistic!
 DOCUMENT_no_ego_a: A short article from Eckhart Tolle, the most popular
spiritual author in the United States and best-selling author of The Power of Now.
We expect this to be non_Egoistic!
 DOCUMENT_no_ego_b: Another short article from Eckhart Tolle, the most popular
spiritual author in the United States and best-selling author of The Power of Now.
We expect this to be non_Egoistic!
@ Rahmatian, Mahyar 4
Method 1 – entities frequency
 The more entities in a document the more egoistic, use spaCy to find all entities.
 Frequency of top 5 entities
 Average of DOCUMENT_ego 17
 Average of DOCUMENT_no_ego 2.5
@ Rahmatian, Mahyar 5
Method 2 - tense
 Ego likes past and future, and dissolves in present , use NLTK word_tokenize to find
the tense of a document. (word infections) The less present more egotistic.
 present %
 Average of DOCUMENT_ego 69
 Average of DOCUMENT_no_ego 71.5
@ Rahmatian, Mahyar 6
Method 3 - plural
 The less % of plural version of verbs/nouns in use, the more egoistic.
 Plural percent
 Average of DOCUMENT_ego 6.5
 Average of DOCUMENT_no_ego 3.5
@ Rahmatian, Mahyar 7
Method 4 - pronoun
 Use spaCy pronoun detection to find separationist (I, mine, yours) vs inclusive (we,
ours) pronouns. Ego documents show less inclusive.
 Inclusive pronoun percent
 Average of DOCUMENT_ego 3.5
 Average of DOCUMENT_no_ego 17
@ Rahmatian, Mahyar 8
Method 5 - readability
 Ego likes high complexity in readability. Use spacy_readability library to score a
document in 2 different methods, then simplify the average of those methods to
Easy, Hard, and Very Hard readability
 Average of DOCUMENT_ego Hard readability
 Average of DOCUMENT_no_ego Hard readability
@ Rahmatian, Mahyar 9
Method 6 - sentiment
 Ego likes negativity. Use NLTK SentimentIntensityAnalyzer to find the sentiment
 Average of DOCUMENT_ego neutral
 Average of DOCUMENT_no_ego neutral
@ Rahmatian, Mahyar 10
Method 7 - emotion
 Ego likes Angry, Surprise, Sad, Fear, but not Joy. Use text2emotion to detect
emotions, then calculate, score = happy - (Angry + Surprise + Sad + Fear)
from +1 (max happy) to -1 (min happy)
 Emotion score
 Average of DOCUMENT_ego -.85
 Average of DOCUMENT_no_ego -.65
@ Rahmatian, Mahyar 11
Method 8 – training NER
 Training spaCy with sentences to learn two new Egoistic and non-Egoistic entities
(NER).
 For training egoistic entities, we need egoistic words. These words must be used in
two sentences. One sentence with egoistic context and the other in non- Egoistic
or neutral context.
 For example:
“complain” is an egoistic word
 Egoistic sentence is “She had done nothing but cry, complain and faint since
this ordeal had begun”
 Non-Egoistic sentence is “I have nothing to complain about”
@ Rahmatian, Mahyar 12
Method 8 - training NER
 We start with seed words for both Egoistic and Non-Egoistic entities. We then find
synonyms and antonyms words for both sets. And later, we combine them to our
collection of Egoistic and non-Egoistic list of words.
 For example:
 complain  criticize (synonyms), applaud (antonyms)
 gratitude  grateful (synonyms), resentment (antonyms)
 Combined Egoistic list = complain, criticize, resentment
 Combined Non-Egoistic list = gratitude, grateful, applaud
 We can find thousands of words, but here we just select about 20 words from each
category to make a sentence
@ Rahmatian, Mahyar 13
Method 8 - training NER
 Training sentences for Egoistic entity, one word is used in egotistic context and
next line, the same word is used in non-egoistic context
@ Rahmatian, Mahyar 14
Method 8 - training NER
 Use Spacy matcher to help with labeling, {'entities': [(25, 35, 'EGOISTIC')]}) then a
little manual formatting to get the final training text below
@ Rahmatian, Mahyar 15
Method 8 - training NER
 Training
@ Rahmatian, Mahyar 16
Method 8 – EGOISTIC entity
 Finding our new EGOISTIC entity in our documents
 Average number of EGOISTIC entities for DOCUMENT_ego 8.5
 Average number of EGOISTIC entities for DOCUMENT_no_ego 5
@ Rahmatian, Mahyar 17
Method 8 – training NER (non-EGOISTIC)
 Different set of words to write paired-sentences for non-EGOISTIC sentences
@ Rahmatian, Mahyar 18
Method 8 – non-EGOISTIC entity
 Finding our new non-EGOISTIC entity in our documents
 Average number of non_EGOISTIC entities for DOCUMENT_ego 0.5
 Average number of non_EGOISTIC entities for DOCUMENT_no_ego 2.5
@ Rahmatian, Mahyar 19
Final Tally
 Scores from all methods. (< means less number is better, less egotism)
 We see that 6 (in bold) out of 9 indicators correctly differentiated between the two
documents
 There is room to improve each of the indicators for greater differentiation
 It is also possible to run more documents through the 9 indicators and gather
more rows, then feed those rows to a secondary NN.
@ Rahmatian, Mahyar 20
The End
 Associated notebook is a very good training ground for deep learning in NLP.
 It is very time consuming to generate a good set of labeled sentences to feed the
model. With more effort on labeled sentences, it will be easy to detect egotism in
our text more accurately.
 It is possible to feed the result of all indicators to yet another Deep Learning
model and expect higher accuracy
 Future Enactments:
 Voice to text
 Web based
 Individual method Scoring improvements
 Train with more labeled sentences
 Upgrade to spaCy 3.0 and use spacy-transformers, pretrained transformers like
BERT
 Resume Enhancer
@ Rahmatian, Mahyar 21
“The sage battles his own ego, the fool battles
everyone else’s” - Rumi
@ Rahmatian, Mahyar 22
YouTube URLs, Last Page
 Two minute (short): https://youtu.be/9DYvJWaepc8
 15 minutes (long): https://youtu.be/KZqg6KqUyMg
@Your Name 23

More Related Content

What's hot

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysisishan0019
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
Natural language processing in artificial intelligence
Natural language processing in artificial intelligenceNatural language processing in artificial intelligence
Natural language processing in artificial intelligenceAbdul Rafay
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSISTEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSISacijjournal
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Bo Hyun Kim
 

What's hot (14)

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
columbia-gwu
columbia-gwucolumbia-gwu
columbia-gwu
 
Ml ppt
Ml pptMl ppt
Ml ppt
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
Natural language processing in artificial intelligence
Natural language processing in artificial intelligenceNatural language processing in artificial intelligence
Natural language processing in artificial intelligence
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSISTEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
TEXT CLASSIFICATION FOR AUTHORSHIP ATTRIBUTION ANALYSIS
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014Lexicon-Based Sentiment Analysis at GHC 2014
Lexicon-Based Sentiment Analysis at GHC 2014
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar to Detecting egotism in text - Mahyar Rahmatian 2020

detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from textSafayet Hossain
 
A review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxA review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxvoicemail1
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysisIOSR Journals
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfOmSatpathy
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesKhan Mostafa
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxKevinSims18
 
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam LanguageA Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam LanguageJeff Nelson
 
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWSRULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWSijaia
 

Similar to Detecting egotism in text - Mahyar Rahmatian 2020 (20)

detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
A review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxA review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Emotion mining in text
Emotion mining in textEmotion mining in text
Emotion mining in text
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysis
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
 
NLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docxNLP Techniques for Sentiment Anaysis.docx
NLP Techniques for Sentiment Anaysis.docx
 
Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017Inspecting the sentiment behind customer ijcset feb_2017
Inspecting the sentiment behind customer ijcset feb_2017
 
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam LanguageA Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
A Subjective Feature Extraction For Sentiment Analysis In Malayalam Language
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
 
O’Brien .docx
O’Brien                                                   .docxO’Brien                                                   .docx
O’Brien .docx
 
Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.Google, Machine Learning, Algorithms, and You.
Google, Machine Learning, Algorithms, and You.
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWSRULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
RULE-BASED SENTIMENT ANALYSIS OF UKRAINIAN REVIEWS
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Detecting egotism in text - Mahyar Rahmatian 2020

  • 1. 1 Final Project Detecting Egotism in Text using Deep Learning Rahmatian, Mahyar @Rahmatian, Mahyar CSCI E-89 Deep Learning, Spring 2020 Harvard University Extension School Prof. Zoran B. Djordjević
  • 2. “The Ego is a veil between humans and God.” Rumi  What is this ego that we need to identify and transcend?  Egotism features an inflated opinion of one's personal features and importance distinguished by a person’s amplified vision of one’s self and self-importance. It is a destructive force that we can recognize in our text using Deep Learning.  We mainly will be using Python’s spaCY prebuilt statistical neural network models to perform tasks on English text. We’ll also be training spaCy’s CNN model with our own data (egoistic and non-egotistic sentences) to introduce new NERs (Name Entity Recognition). Other Python NLP libraries used in this project are NTLK, and Genism.  We’ll be defining 8 different methods to detect Egotism in text.  It may be subjective as to what is or is not egotistic, it should be fairly easy to reflect those changes in our detection methods. See project report for more detail on our definitions. @ Rahmatian, Mahyar 2
  • 3. Pre-possessing  Cleanup import preprocess_kgptalkie as ps def get_clean(x): x = str(x).lower().replace('', '').replace('_', ' ') x = ps.remove_emails(x) x = ps.remove_urls(x) x = ps.remove_html_tags(x) x = ps.remove_accented_chars(x) x = ps.remove_special_chars(x) x = ps.make_base(x) x = re.sub("(.)1{2,}", "1", x) return x DOCUMENT_cleaned= get_clean(DOCUMENT)  Summarize from gensim.summarization import summarize print(summarize(DOCUMENT, word_count=75, split=False)) @ Rahmatian, Mahyar 3
  • 4. 5 Documents to Examine  Document: A CNN news item text – as a reference point and we expect this to be a neutral document  DOCUMENT_ego_a: A statement from President Trump about President-Elect Biden. We expect this to be Egoistic!  DOCUMENT_ego_b: A text segment From Donald Trump’s book, The Art of Deal. We expect this to be Egoistic!  DOCUMENT_no_ego_a: A short article from Eckhart Tolle, the most popular spiritual author in the United States and best-selling author of The Power of Now. We expect this to be non_Egoistic!  DOCUMENT_no_ego_b: Another short article from Eckhart Tolle, the most popular spiritual author in the United States and best-selling author of The Power of Now. We expect this to be non_Egoistic! @ Rahmatian, Mahyar 4
  • 5. Method 1 – entities frequency  The more entities in a document the more egoistic, use spaCy to find all entities.  Frequency of top 5 entities  Average of DOCUMENT_ego 17  Average of DOCUMENT_no_ego 2.5 @ Rahmatian, Mahyar 5
  • 6. Method 2 - tense  Ego likes past and future, and dissolves in present , use NLTK word_tokenize to find the tense of a document. (word infections) The less present more egotistic.  present %  Average of DOCUMENT_ego 69  Average of DOCUMENT_no_ego 71.5 @ Rahmatian, Mahyar 6
  • 7. Method 3 - plural  The less % of plural version of verbs/nouns in use, the more egoistic.  Plural percent  Average of DOCUMENT_ego 6.5  Average of DOCUMENT_no_ego 3.5 @ Rahmatian, Mahyar 7
  • 8. Method 4 - pronoun  Use spaCy pronoun detection to find separationist (I, mine, yours) vs inclusive (we, ours) pronouns. Ego documents show less inclusive.  Inclusive pronoun percent  Average of DOCUMENT_ego 3.5  Average of DOCUMENT_no_ego 17 @ Rahmatian, Mahyar 8
  • 9. Method 5 - readability  Ego likes high complexity in readability. Use spacy_readability library to score a document in 2 different methods, then simplify the average of those methods to Easy, Hard, and Very Hard readability  Average of DOCUMENT_ego Hard readability  Average of DOCUMENT_no_ego Hard readability @ Rahmatian, Mahyar 9
  • 10. Method 6 - sentiment  Ego likes negativity. Use NLTK SentimentIntensityAnalyzer to find the sentiment  Average of DOCUMENT_ego neutral  Average of DOCUMENT_no_ego neutral @ Rahmatian, Mahyar 10
  • 11. Method 7 - emotion  Ego likes Angry, Surprise, Sad, Fear, but not Joy. Use text2emotion to detect emotions, then calculate, score = happy - (Angry + Surprise + Sad + Fear) from +1 (max happy) to -1 (min happy)  Emotion score  Average of DOCUMENT_ego -.85  Average of DOCUMENT_no_ego -.65 @ Rahmatian, Mahyar 11
  • 12. Method 8 – training NER  Training spaCy with sentences to learn two new Egoistic and non-Egoistic entities (NER).  For training egoistic entities, we need egoistic words. These words must be used in two sentences. One sentence with egoistic context and the other in non- Egoistic or neutral context.  For example: “complain” is an egoistic word  Egoistic sentence is “She had done nothing but cry, complain and faint since this ordeal had begun”  Non-Egoistic sentence is “I have nothing to complain about” @ Rahmatian, Mahyar 12
  • 13. Method 8 - training NER  We start with seed words for both Egoistic and Non-Egoistic entities. We then find synonyms and antonyms words for both sets. And later, we combine them to our collection of Egoistic and non-Egoistic list of words.  For example:  complain  criticize (synonyms), applaud (antonyms)  gratitude  grateful (synonyms), resentment (antonyms)  Combined Egoistic list = complain, criticize, resentment  Combined Non-Egoistic list = gratitude, grateful, applaud  We can find thousands of words, but here we just select about 20 words from each category to make a sentence @ Rahmatian, Mahyar 13
  • 14. Method 8 - training NER  Training sentences for Egoistic entity, one word is used in egotistic context and next line, the same word is used in non-egoistic context @ Rahmatian, Mahyar 14
  • 15. Method 8 - training NER  Use Spacy matcher to help with labeling, {'entities': [(25, 35, 'EGOISTIC')]}) then a little manual formatting to get the final training text below @ Rahmatian, Mahyar 15
  • 16. Method 8 - training NER  Training @ Rahmatian, Mahyar 16
  • 17. Method 8 – EGOISTIC entity  Finding our new EGOISTIC entity in our documents  Average number of EGOISTIC entities for DOCUMENT_ego 8.5  Average number of EGOISTIC entities for DOCUMENT_no_ego 5 @ Rahmatian, Mahyar 17
  • 18. Method 8 – training NER (non-EGOISTIC)  Different set of words to write paired-sentences for non-EGOISTIC sentences @ Rahmatian, Mahyar 18
  • 19. Method 8 – non-EGOISTIC entity  Finding our new non-EGOISTIC entity in our documents  Average number of non_EGOISTIC entities for DOCUMENT_ego 0.5  Average number of non_EGOISTIC entities for DOCUMENT_no_ego 2.5 @ Rahmatian, Mahyar 19
  • 20. Final Tally  Scores from all methods. (< means less number is better, less egotism)  We see that 6 (in bold) out of 9 indicators correctly differentiated between the two documents  There is room to improve each of the indicators for greater differentiation  It is also possible to run more documents through the 9 indicators and gather more rows, then feed those rows to a secondary NN. @ Rahmatian, Mahyar 20
  • 21. The End  Associated notebook is a very good training ground for deep learning in NLP.  It is very time consuming to generate a good set of labeled sentences to feed the model. With more effort on labeled sentences, it will be easy to detect egotism in our text more accurately.  It is possible to feed the result of all indicators to yet another Deep Learning model and expect higher accuracy  Future Enactments:  Voice to text  Web based  Individual method Scoring improvements  Train with more labeled sentences  Upgrade to spaCy 3.0 and use spacy-transformers, pretrained transformers like BERT  Resume Enhancer @ Rahmatian, Mahyar 21
  • 22. “The sage battles his own ego, the fool battles everyone else’s” - Rumi @ Rahmatian, Mahyar 22
  • 23. YouTube URLs, Last Page  Two minute (short): https://youtu.be/9DYvJWaepc8  15 minutes (long): https://youtu.be/KZqg6KqUyMg @Your Name 23