1
Tweets Classification
Supervisor - Dr. Vikas Saxena
Name - Shubhangi Agarwal
Varun Ajay Gupta
Enrolment No. – 10104768
1...
Introduction
• As we are living in an era of social networking that’s
why our project focuses on twitter. In this project
...
Problem Statement
• Extraction of tweets.
• Converting unstructured data into structured data.
• Pre-processing of data .
...
Algorithm
• SVMs (support vector machines) are supervised
learning models with associated
learning algorithms that analyse...
Why SVM ?
• Most popular in text classification.
• High accuracy in comparison to other algorithms.
• By choosing right fe...
Technology Used
• Operating System: UBUNTU 12.04 .
• Language: PYTHON
• Tools: GEDIT
• Debugger: PYTHON DEBUGGER
5/29/2014...
5/29/2014Footer Text 7
Unstructured Tweets
5/29/2014Footer Text 8
Structured Tweets
5/29/2014Footer Text 9
Calculating most popular
language on twitter
5/29/2014Footer Text 10
Pictorially showing
popularity of languages
5/29/2014Footer Text 11
Features choose
• No of sports words.
• No of politics words.
• No of entertainment words.
• Lexical complexity.
• No of h...
Values of features of
training set
5/29/2014Footer Text 13
Feature values of testing data
set before application of SVM
5/29/2014Footer Text 14
Result of classification of
tweets
5/29/2014Footer Text 15
Graph of SVM and
accuracy
5/29/2014Footer Text 16
Conclusion
On implementing the SVM on the testing dataset .
It classifies the data into sports ,entertainment and
politics...
Future Work
• Till now we have implemented the SVM to classify
the tweets in general categories like Sports , politics
, e...
5/29/2014 19
Thank You
Upcoming SlideShare
Loading in …5
×

Tweets Classification

272 views
186 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
272
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Tweets Classification

  1. 1. 1 Tweets Classification Supervisor - Dr. Vikas Saxena Name - Shubhangi Agarwal Varun Ajay Gupta Enrolment No. – 10104768 10104730
  2. 2. Introduction • As we are living in an era of social networking that’s why our project focuses on twitter. In this project we extracts the tweets and then classify them into different categories . As with extraction of tweets we extracts the huge amount of information with it. • By using tweet classification we can predict the current trend like which is most popular language on twitter, most talked about person , burning topics and much more. 5/29/2014Footer Text 2
  3. 3. Problem Statement • Extraction of tweets. • Converting unstructured data into structured data. • Pre-processing of data . • Finding the most popular language on twitter. • Choosing of features for the classification. • Classifying the tweets into different categories. 5/29/2014Footer Text 3
  4. 4. Algorithm • SVMs (support vector machines) are supervised learning models with associated learning algorithms that analyse data and recognize patterns, used for classification and regression analysis . • Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, 5/29/2014Footer Text 4
  5. 5. Why SVM ? • Most popular in text classification. • High accuracy in comparison to other algorithms. • By choosing right features svm can be robust even when the training sample has some bias. 5/29/2014Footer Text 5
  6. 6. Technology Used • Operating System: UBUNTU 12.04 . • Language: PYTHON • Tools: GEDIT • Debugger: PYTHON DEBUGGER 5/29/2014Footer Text 6
  7. 7. 5/29/2014Footer Text 7
  8. 8. Unstructured Tweets 5/29/2014Footer Text 8
  9. 9. Structured Tweets 5/29/2014Footer Text 9
  10. 10. Calculating most popular language on twitter 5/29/2014Footer Text 10
  11. 11. Pictorially showing popularity of languages 5/29/2014Footer Text 11
  12. 12. Features choose • No of sports words. • No of politics words. • No of entertainment words. • Lexical complexity. • No of hash tags. • No of digits. 5/29/2014Footer Text 12
  13. 13. Values of features of training set 5/29/2014Footer Text 13
  14. 14. Feature values of testing data set before application of SVM 5/29/2014Footer Text 14
  15. 15. Result of classification of tweets 5/29/2014Footer Text 15
  16. 16. Graph of SVM and accuracy 5/29/2014Footer Text 16
  17. 17. Conclusion On implementing the SVM on the testing dataset . It classifies the data into sports ,entertainment and politics category with a accuracy of 97.5% 5/29/2014Footer Text 17
  18. 18. Future Work • Till now we have implemented the SVM to classify the tweets in general categories like Sports , politics , entertainment. We will try to implement it to categories data into more specific categories so that it can be used by the marketing and PR team of different organizations while they are choosing their strategies. 5/29/2014Footer Text 18
  19. 19. 5/29/2014 19 Thank You

×