SlideShare a Scribd company logo
1 of 16
Sentiment Analysis
Demonstration: Classification & Clustering
Yasas Senarath - Information Retrieval
Dataset and Tools Required
โ— Dataset
โ—‹ https://www.kaggle.com/c/si650winter11 (Training Dataset Only)
โ—‹ You will be able to submit a prediction using testing set.
โ— Tools Required
โ—‹ Python 3.6 (or other)
โ—‹ Scikit-Learn Toolkit
โ—‹ NLTK (You will have to download โ€˜stopwordโ€™ using nltk.dowload())
2
High Level Architecture
โ— Goals
โ—‹ to classify the sentiment of each sentence into "positive" or "negative".
โ—‹ to identify clusters
3
Documents
Classify
Cluster
Cluster PolarityCombine
(Polarity)
Classification
4
Step 1: Loading Dataset
def read_dataset():
with open('../resc/data/training.txt', 'r', encoding='utf-8') as f:
records = list(zip(*[line.split('t') for line in f.readlines()]))
return records[1], records[0]
train_text, train_labels = read_dataset()
5
Step 2: Extracting Features
โ— We will try out TF-IDF features
from nltk import TweetTokenizer
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
stops = set(stopwords.words('english'))
6
Step 2: Extracting Features
โ— We will try out TF-IDF features
kwargs = {
'encoding': 'utf-8',
'preprocessor': None,
'stop_words': stops,
'lowercase': True,
'tokenizer': TweetTokenizer().tokenize
}
tfidfVec = TfidfVectorizer(**kwargs)
X_train = tfidfVec.fit_transform(train_text)
# X_test = tfidfVec.transform(test_text)
X_train = X_train.toarray()
7
Step 4: Training the Classifier
โ— Define the Classifier
โ—‹ Letโ€™s create an SVC (Support Vector Classifier)
โ— Training the classifier
svc = LinearSVC()
svc.fit(X_train, train_labels)
8
Step 4: Training the Classifier
โ— Fix
ValueError: pos_label=1 is not a valid label: array(['0', '1'], dtype='<U1')
le = LabelEncoder()
y = le.fit_transform(train_labels)
svc = LinearSVC()
svc.fit(X_train, y_train)
โ— Oops!
9
Step 5: Evaluation
โ— 5-Fold Cross-Validation
โ— Train / Test Split
scores = cross_val_score(svc, X, y, cv=5, scoring='f1')
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.33, random_state=42, shuffle=True)
10
Clustering
11
Step 1: Training the Clustering Algorithm
NUM_CLUSTERS = 4
kmeans = KMeans(
n_clusters=NUM_CLUSTERS,
random_state=0
)
kmeans.fit(X)
12
Step 2: Evaluating Clusters
labels = kmeans.labels_
score = silhouette_score(X, labels)
print('Silhouette Score: {}'.format(score))
13
Clusters...
I really enjoyed the Da Vinci Code
but thought I would be disappointed
in the other books & # 8230;.
this was the first clive cussler i've
ever read, but even books like Relic,
and Da Vinci code were more
plausible than this.
Brokeback Mountain was amazing,
and made me cry like a bitch.
Brokeback Mountain is an excellent
movie, I love it after watching it!
The Da Vinci Code book is just
awesome.
i liked the Da Vinci Code a lot.
friday i stayed in & watched Mission
Impossible 3 which is amazing by the
way.
I LOVED Mission Impossible 3..
Da Vinci Code
Brokeback Mountain
Mission Impossible 14
Combining the two methods...
A simple approach would be toโ€ฆ Find the percentage of positives for each cluster
15
16

More Related Content

Similar to Twitter sentiment analysis

svm classification
svm classificationsvm classification
svm classificationAkhilesh Joshi
ย 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3Max Kleiner
ย 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learningMax Kleiner
ย 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and RAkhilesh Joshi
ย 
EdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for JavaEdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for JavaLisa Hua
ย 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
ย 
knn classification
knn classificationknn classification
knn classificationAkhilesh Joshi
ย 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
ย 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_TitanicAliciaWei1
ย 
Math for anomaly detection
Math for anomaly detectionMath for anomaly detection
Math for anomaly detectionMenglinLiu1
ย 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonAkhilesh Joshi
ย 
lab program 6.pdf
lab program 6.pdflab program 6.pdf
lab program 6.pdfDHANUSH200561
ย 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIIMax Kleiner
ย 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdfMariaKhan905189
ย 
Assignment 6.2a.pdf
Assignment 6.2a.pdfAssignment 6.2a.pdf
Assignment 6.2a.pdfdash41
ย 
Need help filling out the missing sections of this code- the sections.docx
Need help filling out the missing sections of this code- the sections.docxNeed help filling out the missing sections of this code- the sections.docx
Need help filling out the missing sections of this code- the sections.docxlauracallander
ย 

Similar to Twitter sentiment analysis (20)

svm classification
svm classificationsvm classification
svm classification
ย 
maXbox starter65 machinelearning3
maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
ย 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
ย 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and R
ย 
EdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for JavaEdSketch: Execution-Driven Sketching for Java
EdSketch: Execution-Driven Sketching for Java
ย 
ML .pptx
ML .pptxML .pptx
ML .pptx
ย 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
ย 
knn classification
knn classificationknn classification
knn classification
ย 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
ย 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_Titanic
ย 
Math for anomaly detection
Math for anomaly detectionMath for anomaly detection
Math for anomaly detection
ย 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and Python
ย 
lab program 6.pdf
lab program 6.pdflab program 6.pdf
lab program 6.pdf
ย 
Op ps
Op psOp ps
Op ps
ย 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
ย 
K Means Clustering in ML.pptx
K Means Clustering in ML.pptxK Means Clustering in ML.pptx
K Means Clustering in ML.pptx
ย 
20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf20MEMECH Part 3- Classification.pdf
20MEMECH Part 3- Classification.pdf
ย 
Assignment 6.2a.pdf
Assignment 6.2a.pdfAssignment 6.2a.pdf
Assignment 6.2a.pdf
ย 
Xgboost
XgboostXgboost
Xgboost
ย 
Need help filling out the missing sections of this code- the sections.docx
Need help filling out the missing sections of this code- the sections.docxNeed help filling out the missing sections of this code- the sections.docx
Need help filling out the missing sections of this code- the sections.docx
ย 

More from Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisYasas Senarath
ย 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Yasas Senarath
ย 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Yasas Senarath
ย 
Solr workshop
Solr workshopSolr workshop
Solr workshopYasas Senarath
ย 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion MiningYasas Senarath
ย 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big DataYasas Senarath
ย 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
ย 

More from Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
ย 
Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data Forecasting covid 19 by states with mobility data
Forecasting covid 19 by states with mobility data
ย 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
ย 
Solr workshop
Solr workshopSolr workshop
Solr workshop
ย 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
ย 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
ย 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
ย 

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
ย 
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...amitlee9823
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...amitlee9823
ย 
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...amitlee9823
ย 
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
ย 
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...amitlee9823
ย 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
ย 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
ย 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
ย 
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service B...
ย 
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
ย 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
ย 
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
ย 
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
ย 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
ย 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
ย 

Twitter sentiment analysis

  • 1. Sentiment Analysis Demonstration: Classification & Clustering Yasas Senarath - Information Retrieval
  • 2. Dataset and Tools Required โ— Dataset โ—‹ https://www.kaggle.com/c/si650winter11 (Training Dataset Only) โ—‹ You will be able to submit a prediction using testing set. โ— Tools Required โ—‹ Python 3.6 (or other) โ—‹ Scikit-Learn Toolkit โ—‹ NLTK (You will have to download โ€˜stopwordโ€™ using nltk.dowload()) 2
  • 3. High Level Architecture โ— Goals โ—‹ to classify the sentiment of each sentence into "positive" or "negative". โ—‹ to identify clusters 3 Documents Classify Cluster Cluster PolarityCombine
  • 5. Step 1: Loading Dataset def read_dataset(): with open('../resc/data/training.txt', 'r', encoding='utf-8') as f: records = list(zip(*[line.split('t') for line in f.readlines()])) return records[1], records[0] train_text, train_labels = read_dataset() 5
  • 6. Step 2: Extracting Features โ— We will try out TF-IDF features from nltk import TweetTokenizer from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer stops = set(stopwords.words('english')) 6
  • 7. Step 2: Extracting Features โ— We will try out TF-IDF features kwargs = { 'encoding': 'utf-8', 'preprocessor': None, 'stop_words': stops, 'lowercase': True, 'tokenizer': TweetTokenizer().tokenize } tfidfVec = TfidfVectorizer(**kwargs) X_train = tfidfVec.fit_transform(train_text) # X_test = tfidfVec.transform(test_text) X_train = X_train.toarray() 7
  • 8. Step 4: Training the Classifier โ— Define the Classifier โ—‹ Letโ€™s create an SVC (Support Vector Classifier) โ— Training the classifier svc = LinearSVC() svc.fit(X_train, train_labels) 8
  • 9. Step 4: Training the Classifier โ— Fix ValueError: pos_label=1 is not a valid label: array(['0', '1'], dtype='<U1') le = LabelEncoder() y = le.fit_transform(train_labels) svc = LinearSVC() svc.fit(X_train, y_train) โ— Oops! 9
  • 10. Step 5: Evaluation โ— 5-Fold Cross-Validation โ— Train / Test Split scores = cross_val_score(svc, X, y, cv=5, scoring='f1') X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, shuffle=True) 10
  • 12. Step 1: Training the Clustering Algorithm NUM_CLUSTERS = 4 kmeans = KMeans( n_clusters=NUM_CLUSTERS, random_state=0 ) kmeans.fit(X) 12
  • 13. Step 2: Evaluating Clusters labels = kmeans.labels_ score = silhouette_score(X, labels) print('Silhouette Score: {}'.format(score)) 13
  • 14. Clusters... I really enjoyed the Da Vinci Code but thought I would be disappointed in the other books & # 8230;. this was the first clive cussler i've ever read, but even books like Relic, and Da Vinci code were more plausible than this. Brokeback Mountain was amazing, and made me cry like a bitch. Brokeback Mountain is an excellent movie, I love it after watching it! The Da Vinci Code book is just awesome. i liked the Da Vinci Code a lot. friday i stayed in & watched Mission Impossible 3 which is amazing by the way. I LOVED Mission Impossible 3.. Da Vinci Code Brokeback Mountain Mission Impossible 14
  • 15. Combining the two methods... A simple approach would be toโ€ฆ Find the percentage of positives for each cluster 15
  • 16. 16