Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Predict the Interesting of an article
Using Twitter
Chitra khatwani
Yashasvi girdhar
Khyati chandu
R.K. Srinivas
The project aims at measuring the
interestingness of articles by analyzing the
tweets related to the entities in the artic...
Approach Followed
●
Extract all the named entities from the article
> Two methods can be followed
●
Using NLTK Library
●
U...
Approach Followed
●
Shortlist all the dominant entities from the
extracted entities
– Dominant entities are those, which a...
Approach Followed
●
Mine all the tweets related to all the dominant
entities
●
Done using Twitter Search API
●
Need to col...
Approach Followed
●
Categorize each tweet as +ve , -ve or neutral
– Consider all the unigrams tokens equally
– Score each ...
Approach Followed
●
Predict the interestingness of the article, using
the number of positive and negative tweets
We have f...
Datasets used
●
For Articles
– A set of random news articles taken from the BBC
News Dataset
●
For Sentiment Analysis
– Me...
Challenges
●
Collecting the right set of articles for testing our
model
●
Finding the Right dataset for twitter and then,
...
Conclusion
●
Social Media, such as twitter in this case, is a
very common medium for people nowadays, to
express their opi...
Upcoming SlideShare
Loading in …5
×

Predict Interestingness of An Article Using Twitter

292 views

Published on

The project aims at measuring the interestingness of articles by analyzing the tweets related to the entities in the article.


Application:

We can order the articles for a search query according to their interestingness.

Suggesting news articles to users on websites

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Predict Interestingness of An Article Using Twitter

  1. 1. Predict the Interesting of an article Using Twitter Chitra khatwani Yashasvi girdhar Khyati chandu R.K. Srinivas
  2. 2. The project aims at measuring the interestingness of articles by analyzing the tweets related to the entities in the article. ● Application: – We can order the articles for a search query according to their interestingness. – Suggesting news articles to users on websites
  3. 3. Approach Followed ● Extract all the named entities from the article > Two methods can be followed ● Using NLTK Library ● Using A list of Wikipedia Titles We have used the second approach, because the nltk library misses out many important entities, in some cases.
  4. 4. Approach Followed ● Shortlist all the dominant entities from the extracted entities – Dominant entities are those, which are most frequently talked about in the article. – Methods: ● Can be decided based on the frequency of entities ● Entities occurring in the title of the article
  5. 5. Approach Followed ● Mine all the tweets related to all the dominant entities ● Done using Twitter Search API ● Need to collect the tweets of the entities, around the date when the article was published. ● Need to parse the tweets before storing them, to make thhem ready for the next steps.
  6. 6. Approach Followed ● Categorize each tweet as +ve , -ve or neutral – Consider all the unigrams tokens equally – Score each token using the naive bayes formula – Sum up the scores of all the tokens to calculate the score for an entitiy
  7. 7. Approach Followed ● Predict the interestingness of the article, using the number of positive and negative tweets We have followed the below approach : – Less is the difference between number of positive tweets and number of negative tweets, more is the interestingness of the article. – On the other hand, if the number of positive entities outweighs the number of negative entities, or vice- versa, the article is considered less interesting.
  8. 8. Datasets used ● For Articles – A set of random news articles taken from the BBC News Dataset ● For Sentiment Analysis – Mejaj Dataset ● Built on the basis of categorizing tweets on the basis of predefined list of positive and negative words – Standford Dataset
  9. 9. Challenges ● Collecting the right set of articles for testing our model ● Finding the Right dataset for twitter and then, deciding upon the parameters, to categorize the tweet ● Deciding upon the appropriate algorithm for deciding the interestingness of the article, based on the +ve and -ve tweets
  10. 10. Conclusion ● Social Media, such as twitter in this case, is a very common medium for people nowadays, to express their opinions about something. This can be leveraged as a very powerful medium, in predicting the nature of the data published on the web, specially millions of articles that are published each day. This can also be used in suggesting the articles to the users. References ● Mining Sentiments from Tweets, Siel, IIIT-Hyderabad

×