• As we are living in an era of social networking that’s
why our project focuses on twitter. In this project
we extracts the tweets and then classify them into
different categories . As with extraction of tweets
we extracts the huge amount of information with it.
• By using tweet classification we can predict the
current trend like which is most popular language
on twitter, most talked about person , burning topics
and much more.
5/29/2014Footer Text 2
• Extraction of tweets.
• Converting unstructured data into structured data.
• Pre-processing of data .
• Finding the most popular language on twitter.
• Choosing of features for the classification.
• Classifying the tweets into different categories.
5/29/2014Footer Text 3
• SVMs (support vector machines) are supervised
learning models with associated
learning algorithms that analyse data and
recognize patterns, used for classification
and regression analysis .
• Given a set of training examples, each marked as
belonging to one of two categories, an SVM
training algorithm builds a model that assigns new
examples into one category or the other,
5/29/2014Footer Text 4
Why SVM ?
• Most popular in text classification.
• High accuracy in comparison to other algorithms.
• By choosing right features svm can be robust even
when the training sample has some bias.
5/29/2014Footer Text 5
Calculating most popular
language on twitter
5/29/2014Footer Text 10
popularity of languages
5/29/2014Footer Text 11
• No of sports words.
• No of politics words.
• No of entertainment words.
• Lexical complexity.
• No of hash tags.
• No of digits.
5/29/2014Footer Text 12
Values of features of
5/29/2014Footer Text 13
Feature values of testing data
set before application of SVM
5/29/2014Footer Text 14
Result of classification of
5/29/2014Footer Text 15
Graph of SVM and
5/29/2014Footer Text 16
On implementing the SVM on the testing dataset .
It classifies the data into sports ,entertainment and
politics category with a accuracy of 97.5%
5/29/2014Footer Text 17
• Till now we have implemented the SVM to classify
the tweets in general categories like Sports , politics
, entertainment. We will try to implement it to
categories data into more specific categories so
that it can be used by the marketing and PR team
of different organizations while they are choosing
5/29/2014Footer Text 18