Associated with Vellore Institute of TechnologyAssociated with Vellore Institute of Technology
This project involves utilizing machine learning techniques such as TF-IDF, Naive Bayes, and Logistic Regression for classification purposes.
Classification Techniques: The data is classified into subclasses based on sentiment polarity, with features extracted using techniques like TF-IDF. Machine learning models like Naive Bayes and Logistic Regression are employed for sentiment analysis and depression detection.
To assess the performance of the classification models, evaluation metrics such as F1 score, accuracy, precision, and recall were used. Balanced values of these metrics indicated the effectiveness of the models in accurately detecting depression signs in social media data.
Social Media Sentiment Analysis fro Depression detection Using Machine Learning 1..pptx
1. Social Media Sentiment Analysis for
Depression Detection Using Machine
Learning
GUIDE NAME: SUBMITTED BY:
PROF. PRAKASH P VIDYA SAURABH MISHRA
22MCA1055
2. PROBLEM STATEMENT:
• Depression is one of the most common public health concerns.
• Social media platforms have become ubiquitous in our lives, providing a wealth of data that
researchers can leverage to gain insights into mental health conditions.
• Depression is a pervasive and often undiagnosed in the initial stages.
3. Author Title Tools / Methods Results Dataset Sources
Hameedur Rahman et
al. 2022 [1]
Multi-Tier Sentiment
Analysis of Social
Media Text Using
Supervised Machine
Learning
Naïve Bayes, DT,
Support Vector Machine
Acc: 79%; Pre: 81%; Rec:
96%; F1: 88%
Twitter , Facebook
Shen et al. 2017 [2] Depression detection via
harvesting social media: A
multimodal dictionary
learning solution
MDL3 , MSNL3 ,
WDL3 , NB
Acc: 85%; Pre: 85%; Rec:
85%; F1: 85%
Twitter
Hassan et al. 2017 [3] Sentiment analysis of social
networking sites (SNS) data
using machine learning
approach for the
measurement of depression.
SVM, NB, ME Acc: 91%; Pre: 83%; Rec:
79%
Twitter
Chen et al. (2018) [4] Sentiment Analysis Based
on Deep Learning and Its
Application in Screening
for Perinatal Depression.
LSTM Present the results in graphs WeChat
LITERATURE REVIEW: Explores the recent advancements, and methodologies employed for social
media sentiment analysis in the context of depression detection.
4. Author Title Tools / Methods Results Dataset Sources
Islam et Al. (2018) [5] Depression detection from
social network data using
machine learning
techniques
DT, KNN, SVM, Ensemble Pre: 59%; Rec: 97%; F1: 73% Facebook
Burdisso et al.(2019) [6] A text classification framework
for simple and effective early
depression detection over social
media streams. Expert Systems
with Applications
SS33 , KNN, LR, SVM, NB Pre: 63%; Rec: 60%; F1: 61% Reddit
Fatima et al.(2017) [7] Prediction of postpartum
depression using machine
learning techniques from
social media text. E
MLP, SVM, LR Acc: 91.63%; Pre: 91.83%; Rec:
91.85%
Reddit
Lin et al. (2020) [8] SenseMood: Depression
Detection on Social Media.
In: Proceedings of the 2020
International Conference on
Multimedia Retrieval
CNN Acc: 88.4%; Pre: 90.3%;
Rec: 87%; F1: 93.6%
Twitter
5. Author Title Tools / Methods Results Dataset Sources
Alsagri and Mourad(2020)
[9]
Machine Learning-Based
Approach for Depression
Detection in Twitter Using
Content and Activity
Features.
SVM, Naïve Bayes,
Decision Trees
Acc: 82.5%; Pre: 73.91%; Rec:
85%; F1: 79.06%; AUC: 0.78
Twitter
Kim et al.(2020) [10] Social Media Analysis
for Product Safety using
Text Mining and
Sentiment Analysis
CNN, XGBoost Acc: 75.13%; Pre: 89.1%;
Rec: 71.75%; F1: 79.49%
Reddit
6. RESEARCH OBJECTIVES:
•Use machine learning models and sentiment analysis techniques for early detection of depression
•Early detection of depressive symptoms in depressed people.
7. Data Collection: Gather data from social media where users express their feelings and emotions. We are using
Sentiment140 dataset containing 1.6 million tweets.
Preprocessing: Clean and preprocess the collected data, including text, emoticons, and emojis, to prepare it for
analysis. This process involves removing stopwords. user mentions, url, digits. Stemming and Tokenization is also
done in this step.
Feature Extraction: Extract relevant features from the data, such as emotional words, sentiment scores, and
linguistic patterns, to capture correlations between different aspects of users' writings and depression we are going to
TD-IDF for feature extraction.
PROPOSED SOLUTION:
8. Classification: Utilize machine learning techniques for sentiment analysis and depression detection. We
have used Logistic Regression and Naïve Bayes for classification.
Model Training and Evaluation: Train and evaluate machine learning models using the extracted
features and sentiment analysis to predict the level of depression in social media users' posts
Prediction : The last step is to test the developed model for using an unseen data.
9. ARCHITECTURE DIAGRAM OF PROJECT WORK :
The Sentiment140 dataset underwent a series of preprocessing
steps to ensure optimal performance of the classifiers.
After preprocessing the data, the next step was feature
extraction. In this study, the Term Frequency-Inverse Document
Frequency (TF-IDF) was used for feature extraction.[11]
We used Logistic regression which is a linear classifier and it is
a widely used classifier in Natural Language Processing (NLP)
[12].
We also used another classifier i.e. Naïve Bayes which assumes
conditional independence between pair of features, which is
used for sentiment analysis [13].
10. We are using Sentiment140 dataset containing 1.6 million tweets.
The dataset has two sentiments namely, negative(0) and positive(4).
The dataset is intended for use in sentiment analysis and machine learning tasks related to identifying
depressive content on Twitter.
The dataset can be utilized for training machine learning models to detect depression in social media
content, particularly on the Twitter platform.
DATASET DESCRIPTION :
11. MODULES IDENTIFIED :
Data Collection Module: Collect social media data from Sentiment140 where users expressed their
emotions and feelings.
Preprocessing Module: Clean and preprocess the collected data to prepare it for analysis.
Feature Extraction Module: Extract relevant features from the data, such as emotional words, sentiment
scores to capture correlations between different aspects of users' writings and depression
Classification Module: Utilize machine learning techniques for sentiment analysis and depression
detection,
Model Training and Evaluation Module: Train and evaluate machine learning model using the
extracted features and sentiment analysis to predict the level of depression in social media users' posts
Prediction Module : In this module, we used our classification models to predict the sentiment of an unseen
text. The new input text is first transformed into a numerical format using the same vectorizer that was used during
the training phase.
14. EVALUATION METRIC:
Both models performed well on the binary classification task, with Logistic
Regression achieving slightly higher accuracy. The choice between these two
models would depend on the specific requirements of the task. For instance, if the
cost of false positives is high, the Logistic Regression model would be a better
choice due to its higher precision.
16. [1]Rahman, Hameedur, et al. "Multi-tier sentiment analysis of social media text using supervised machine
learning." Comput. Mater. Contin 74 (2023): 5527-5543.
[2] . Shen G, Jia J, Nie L, Feng F, Zhang C, Hu T, Chua T-S, Zhu W Depression detection via harvesting social media: A
multimodal dictionary learning solution In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence, Melbourne, Australia, 19-25 August 2017. pp 3838-3844. doi:https://doi.org/10.24963/ijcai.2017/536
[3] . Hassan AU, Hussain J, Hussain M, Sadiq M, Lee S Sentiment analysis of social networking sites (SNS) data using
machine learning approach for the measurement of depression. In: Proceedings of 2017 International Conference on
Information and Communication Technology Convergence (ICTC), Jeju, South Korea, 18-20 Oct. 2017. pp 138- 140.
doi:10.1109/ICTC.2017.8190959
[4] Chen Y, Zhou B, Zhang W, Gong W, Sun G Sentiment Analysis Based on Deep Learning and Its Application in
Screening for Perinatal Depression. In: Proceedings of 2018 IEEE Third International Conference on Data Science in
Cyberspace (DSC), 2018. pp 451-456. doi:10.1109/dsc.2018.00073
REFERENCES:
17. [5] Islam MR, Kabir MA, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data
using machine learning techniques. Health Inf Sci Syst 6 (1):8. doi:10.1007/s13755-018-0046-0
[6] Burdisso SG, Errecalde M, Montes-y-Gómez M (2019) A text classification framework for simple and effective early
depression detection over social media streams. Expert Systems with Applications 133:182-197.
doi:10.1016/j.eswa.2019.05.023
[7] Fatima I, Abbasi BUD, Khan S, Al‐Saeed M, Ahmad HF, Mumtaz R (2019) Prediction of postpartum depression using
machine learning techniques from social media text. Expert Systems 36 (4). doi:10.1111/exsy.12409
[8] Lin C, Hu P, Su H, Li S, Mei J, Zhou J, Leung H (2020) SenseMood: Depression Detection on Social Media. In:
Proceedings of the 2020 International Conference on Multimedia Retrieval. Association for Computing Machinery, pp 407–
411. doi:10.1145/3372278.3391932
[9]Alsagri HS, Ykhlef M (2020) Machine Learning-Based Approach for Depression Detection in Twitter Using Content and
Activity Features. IEICE Transactions on Information and Systems E103.D (8):1825-1832. doi:10.1587/transinf.2020EDP7023
[10] Kim J, Lee J, Park E, Han J (2020) A deep learning model for detecting mental illness from user content on social media.
Scientific reports 10 (1):11846-11846. doi:10.1038/s41598-020-68764-y
18. 1. [11] Babu, N.V., Kanaga, E.G.M. Sentiment Analysis in Social Media Data for Depression Detection Using Artificial
Intelligence: A Review. SN COMPUT. SCI. 3, 74 (2022). https://doi.org/10.1007/s42979-021-00958-1
2. [12] Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: New
York, NY, USA, 2013
3. [13] Singh, S., Chandra, S.K. (2023). Sentiment Analysis for Depression Detection and Suicide Prevention Using
Machine Learning Models. In: Garg, L., et al. Key Digital Trends Shaping the Future of Information and
Management Science. ISMS 2022. Lecture Notes in Networks and Systems, vol 671. Springer, Cham.
https://doi.org/10.1007/978-3-031-31153-6_36