Successfully reported this slideshow.
Upcoming SlideShare
×

# Predict Interestingness of An Article Using Twitter

292 views

Published on

The project aims at measuring the interestingness of articles by analyzing the tweets related to the entities in the article.

Application:

We can order the articles for a search query according to their interestingness.

Suggesting news articles to users on websites

Published in: Technology, Education
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Predict Interestingness of An Article Using Twitter

1. 1. Predict the Interesting of an article Using Twitter Chitra khatwani Yashasvi girdhar Khyati chandu R.K. Srinivas
2. 2. The project aims at measuring the interestingness of articles by analyzing the tweets related to the entities in the article. ● Application: – We can order the articles for a search query according to their interestingness. – Suggesting news articles to users on websites
3. 3. Approach Followed ● Extract all the named entities from the article > Two methods can be followed ● Using NLTK Library ● Using A list of Wikipedia Titles We have used the second approach, because the nltk library misses out many important entities, in some cases.
4. 4. Approach Followed ● Shortlist all the dominant entities from the extracted entities – Dominant entities are those, which are most frequently talked about in the article. – Methods: ● Can be decided based on the frequency of entities ● Entities occurring in the title of the article
5. 5. Approach Followed ● Mine all the tweets related to all the dominant entities ● Done using Twitter Search API ● Need to collect the tweets of the entities, around the date when the article was published. ● Need to parse the tweets before storing them, to make thhem ready for the next steps.
6. 6. Approach Followed ● Categorize each tweet as +ve , -ve or neutral – Consider all the unigrams tokens equally – Score each token using the naive bayes formula – Sum up the scores of all the tokens to calculate the score for an entitiy
7. 7. Approach Followed ● Predict the interestingness of the article, using the number of positive and negative tweets We have followed the below approach : – Less is the difference between number of positive tweets and number of negative tweets, more is the interestingness of the article. – On the other hand, if the number of positive entities outweighs the number of negative entities, or vice- versa, the article is considered less interesting.
8. 8. Datasets used ● For Articles – A set of random news articles taken from the BBC News Dataset ● For Sentiment Analysis – Mejaj Dataset ● Built on the basis of categorizing tweets on the basis of predefined list of positive and negative words – Standford Dataset
9. 9. Challenges ● Collecting the right set of articles for testing our model ● Finding the Right dataset for twitter and then, deciding upon the parameters, to categorize the tweet ● Deciding upon the appropriate algorithm for deciding the interestingness of the article, based on the +ve and -ve tweets
10. 10. Conclusion ● Social Media, such as twitter in this case, is a very common medium for people nowadays, to express their opinions about something. This can be leveraged as a very powerful medium, in predicting the nature of the data published on the web, specially millions of articles that are published each day. This can also be used in suggesting the articles to the users. References ● Mining Sentiments from Tweets, Siel, IIIT-Hyderabad