Twitter sentiment analysis

•Download as PPTX, PDF•

0 likes•109 views

Demonstration on how to perform classification and clustering. Selected application for this demo was Sentiment Analysis. First we try to build a Sentiment Classifier using TF-IDF as features with Linear kernel SVM as classifier. Then we perform clustering on the documents based on TF-IDF. I conducted this demo for Information Retrieval lecture at Computer Science and Engineering, University of Moratuwa, Sri Lanka.

Data & Analytics

Sentiment Analysis
Demonstration: Classification & Clustering
Yasas Senarath - Information Retrieval

Dataset and Tools Required
● Dataset
○ https://www.kaggle.com/c/si650winter11 (Training Dataset Only)
○ You will be able to submit a prediction using testing set.
● Tools Required
○ Python 3.6 (or other)
○ Scikit-Learn Toolkit
○ NLTK (You will have to download ‘stopword’ using nltk.dowload())
2

High Level Architecture
● Goals
○ to classify the sentiment of each sentence into "positive" or "negative".
○ to identify clusters
3
Documents
Classify
Cluster
Cluster PolarityCombine

Step 1: Loading Dataset
def read_dataset():
with open('../resc/data/training.txt', 'r', encoding='utf-8') as f:
records = list(zip(*[line.split('t') for line in f.readlines()]))
return records[1], records[0]
train_text, train_labels = read_dataset()
5

Step 2: Extracting Features
● We will try out TF-IDF features
from nltk import TweetTokenizer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
stops = set(stopwords.words('english'))
6

$Step 2: Extracting Features ● We will try out TF-IDF features kwargs = { 'encoding': 'utf-8', 'preprocessor': None, 'stop_words': stops, 'lowercase': True, 'tokenizer': TweetTokenizer().tokenize } tfidfVec = TfidfVectorizer(**kwargs) X_train = tfidfVec.fit_transform(train_text) # X_test = tfidfVec.transform(test_text) X_train = X_train.toarray() 7$

Step 4: Training the Classifier
● Define the Classifier
○ Let’s create an SVC (Support Vector Classifier)
● Training the classifier
svc = LinearSVC()
svc.fit(X_train, train_labels)
8

Step 4: Training the Classifier
● Fix
ValueError: pos_label=1 is not a valid label: array(['0', '1'], dtype='<U1')
le = LabelEncoder()
y = le.fit_transform(train_labels)
svc = LinearSVC()
svc.fit(X_train, y_train)
● Oops!
9

Step 5: Evaluation
● 5-Fold Cross-Validation
● Train / Test Split
scores = cross_val_score(svc, X, y, cv=5, scoring='f1')
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.33, random_state=42, shuffle=True)
10

Step 1: Training the Clustering Algorithm
NUM_CLUSTERS = 4
kmeans = KMeans(
n_clusters=NUM_CLUSTERS,
random_state=0
)
kmeans.fit(X)
12

$Step 2: Evaluating Clusters labels = kmeans.labels_ score = silhouette_score(X, labels) print('Silhouette Score: {}'.format(score)) 13$

Clusters...
I really enjoyed the Da Vinci Code
but thought I would be disappointed
in the other books & # 8230;.
this was the first clive cussler i've
ever read, but even books like Relic,
and Da Vinci code were more
plausible than this.
Brokeback Mountain was amazing,
and made me cry like a bitch.
Brokeback Mountain is an excellent
movie, I love it after watching it!
The Da Vinci Code book is just
awesome.
i liked the Da Vinci Code a lot.
friday i stayed in & watched Mission
Impossible 3 which is amazing by the
way.
I LOVED Mission Impossible 3..
Da Vinci Code
Brokeback Mountain
Mission Impossible 14

Combining the two methods...
A simple approach would be to… Find the percentage of positives for each cluster
15

Similar to Twitter sentiment analysis

svm classificationAkhilesh Joshi

maXbox starter65 machinelearning3Max Kleiner

maxbox starter60 machine learningMax Kleiner

logistic regression with python and RAkhilesh Joshi

EdSketch: Execution-Driven Sketching for JavaLisa Hua

ML .pptxssuser8324dd

Machine Learning AlgorithmsHichem Felouat

knn classificationAkhilesh Joshi

maXbox starter67 machine learning VMax Kleiner

wk5ppt1_TitanicAliciaWei1

Math for anomaly detectionMenglinLiu1

Data preprocessing for Machine Learning with R and PythonAkhilesh Joshi

lab program 6.pdfDHANUSH200561

Op psShehzad Rizwan

maXbox starter69 Machine Learning VIIMax Kleiner

K Means Clustering in ML.pptxRamakrishna Reddy Bijjam

20MEMECH Part 3- Classification.pdfMariaKhan905189

Assignment 6.2a.pdfdash41

XgboostVivian S. Zhang

Need help filling out the missing sections of this code- the sections.docxlauracallander

Similar to Twitter sentiment analysis (20)

svm classification

maXbox starter65 machinelearning3

maxbox starter60 machine learning

logistic regression with python and R

EdSketch: Execution-Driven Sketching for Java

ML .pptx

Machine Learning Algorithms

knn classification

maXbox starter67 machine learning V

wk5ppt1_Titanic

Math for anomaly detection

Data preprocessing for Machine Learning with R and Python

lab program 6.pdf

Op ps

maXbox starter69 Machine Learning VII

K Means Clustering in ML.pptx

20MEMECH Part 3- Classification.pdf

Assignment 6.2a.pdf

Xgboost

Need help filling out the missing sections of this code- the sections.docx

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

April 2024 - Crypto Market Report's Analysismanisha194592

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Edukaciniai dropshipping via API with DroFxolyaivanovalion

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Log Analysis using OSSEC sasoasasasas.pptx

VidaXL dropshipping via API with DroFx.pptx

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

100-Concepts-of-AI by Anupama Kate .pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Sampling (random) method and Non random.ppt

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

April 2024 - Crypto Market Report's Analysis

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Zuja dropshipping via API with DroFx.pptx

Edukaciniai dropshipping via API with DroFx

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Twitter sentiment analysis

1. Sentiment Analysis Demonstration: Classification & Clustering Yasas Senarath - Information Retrieval

2. Dataset and Tools Required ● Dataset ○ https://www.kaggle.com/c/si650winter11 (Training Dataset Only) ○ You will be able to submit a prediction using testing set. ● Tools Required ○ Python 3.6 (or other) ○ Scikit-Learn Toolkit ○ NLTK (You will have to download ‘stopword’ using nltk.dowload()) 2

3. High Level Architecture ● Goals ○ to classify the sentiment of each sentence into "positive" or "negative". ○ to identify clusters 3 Documents Classify Cluster Cluster PolarityCombine

4. (Polarity) Classification 4

5. Step 1: Loading Dataset def read_dataset(): with open('../resc/data/training.txt', 'r', encoding='utf-8') as f: records = list(zip(*[line.split('t') for line in f.readlines()])) return records[1], records[0] train_text, train_labels = read_dataset() 5

6. Step 2: Extracting Features ● We will try out TF-IDF features from nltk import TweetTokenizer from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer stops = set(stopwords.words('english')) 6

7. Step 2: Extracting Features ● We will try out TF-IDF features kwargs = { 'encoding': 'utf-8', 'preprocessor': None, 'stop_words': stops, 'lowercase': True, 'tokenizer': TweetTokenizer().tokenize } tfidfVec = TfidfVectorizer(**kwargs) X_train = tfidfVec.fit_transform(train_text) # X_test = tfidfVec.transform(test_text) X_train = X_train.toarray() 7

8. Step 4: Training the Classifier ● Define the Classifier ○ Let’s create an SVC (Support Vector Classifier) ● Training the classifier svc = LinearSVC() svc.fit(X_train, train_labels) 8

9. Step 4: Training the Classifier ● Fix ValueError: pos_label=1 is not a valid label: array(['0', '1'], dtype='<U1') le = LabelEncoder() y = le.fit_transform(train_labels) svc = LinearSVC() svc.fit(X_train, y_train) ● Oops! 9

10. Step 5: Evaluation ● 5-Fold Cross-Validation ● Train / Test Split scores = cross_val_score(svc, X, y, cv=5, scoring='f1') X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, shuffle=True) 10

11. Clustering 11

12. Step 1: Training the Clustering Algorithm NUM_CLUSTERS = 4 kmeans = KMeans( n_clusters=NUM_CLUSTERS, random_state=0 ) kmeans.fit(X) 12

13. Step 2: Evaluating Clusters labels = kmeans.labels_ score = silhouette_score(X, labels) print('Silhouette Score: {}'.format(score)) 13

14. Clusters... I really enjoyed the Da Vinci Code but thought I would be disappointed in the other books & # 8230;. this was the first clive cussler i've ever read, but even books like Relic, and Da Vinci code were more plausible than this. Brokeback Mountain was amazing, and made me cry like a bitch. Brokeback Mountain is an excellent movie, I love it after watching it! The Da Vinci Code book is just awesome. i liked the Da Vinci Code a lot. friday i stayed in & watched Mission Impossible 3 which is amazing by the way. I LOVED Mission Impossible 3.. Da Vinci Code Brokeback Mountain Mission Impossible 14

15. Combining the two methods... A simple approach would be to… Find the percentage of positives for each cluster 15

16. 16

Twitter sentiment analysis

Recommended

Recommended

More Related Content

Similar to Twitter sentiment analysis

Similar to Twitter sentiment analysis (20)

More from Yasas Senarath

More from Yasas Senarath (7)

Recently uploaded

Recently uploaded (20)

Twitter sentiment analysis