The document discusses identifying and ranking weird or bizarre news stories based on their "weirdness" score. It presents the problem statement, challenges, proposed solution of using machine learning classifiers and ranking algorithms, and evaluation of results. The highest accuracy of around 87% was achieved using a bidirectional LSTM with attention on a dataset of over 100,000 news articles that were classified as either weird or normal. For ranking weirdness levels from 0 to 3, the same model achieved an accuracy of 76% using average ranking and 43% for majority ranking.
3. The problem
Problem statement Challenges
Finding out the most important
features
Same things, same actions, different
objects
object-action mapping is not always
useful
Manual annotation of weird news into
ranks
Classifying given news as weird or
normal news (2 class classification
problem)
for weird news, obtain rank using ML
techniques
predicting the weirdness level of these
news
Problem statement Challenges
4. Final Deliverable
Implementation is proving each news ranked from 1-
4 as user interface and providing a title weirdness
filter that will give the weirdness score(0-3) and tell
whether it is weird or not
5. Definition and Motivation
● Weird/Bizarre news
○ a news article which is so strange that users might question its credibility.
○ strange or bizarre
● Usually very rare, strange and unbelievable.
● Keeps the boredom away.
● Use bizarre news to gain readers attention and increase viewership.
7. Related Work
● No prior work directly related to weird and bizarre news.
● However, we find Clickbait Detection and Fake News Detection closely
related to our topic.
○ Chen, Yimin, Niall J. Conroy, and Victoria L. Rubin. "Misleading online
content: Recognizing clickbait as false news."
○ Chakraborty, Abhijnan, et al. "Stop clickbait: Detecting and preventing
clickbaits in online news media."
○ Bajaj, Samir. "“The Pope Has a New Baby!” Fake News Detection Using
Deep Learning."
8. Dataset
Weird News Articles: 67361
Normal News Articles: 46893
Total News Articles: 114254
The data includes the url and the title of the news article.
9. Solution
Proposed Plans:
1. Classification Algorithms:
a. Support Vector Machine (SVM)
b. Random Forest
c. Logistic Regression
d. Deep Neural Network
e. Recurrent Neural Networks (RNN, LSTM, GRU)
f. Attention Network along with RNNs
2. Tools Used
a. Python
b. Spacy
c. flask
d. Scikit-learn
e. Keras
f. Theano
10. 1st Phase
● Finding the scope of document
● Understanding and Building Project Prototype
● Discussion on
○ Applications
○ Challenges
○ Tools
○ References
● Google links : Scope Document
12. ● Classifying a news article as weird/bizarre or normal.
In this we worked with some of the following features:
● Handcrafted features: Title length, Number of nouns, Number of stop
words, Number of verbs , Frequency of co-occurring words
● Linguistic Features: POS tags, N-grams
● Word Embeddings: Pretrained GloVe embeddings, Doc2Vec embeddings
trained on our dataset
● Google links : Scope Document
2nd Phase
14. Results
Using TF-IDF Vector
Model Training Score Testing Score
Random Forest 0.912355174338 0.792919347075
Neural Network 0.8030 0.79808323489284683
15. Results
Accuracy obtained for above classifiers (without using URL Feature):
Model Training Score Testing Score
Random Forest 0.945011086475 0.80695980955
Neural Network 0.8162 0.81084581989648374
Decision Tree 0.953401797176 0.774646408066
Logistic Regression 0.812848640448 0.80843019185
SVM 0.811226514179 0.807554964291
16. 3rd Phase
● In the third phase of the project, we worked on how to rank the given news
articles on the basis of their weirdness.
● Each member annotated 500 news articles with a rating of 0-3 where 3
represents highly weird and 0 represents close to conventional news.
● After the annotation part, the ranking data from all the team members was
merged into a single file
● Different ranking schemes like average and majority rank were used.
17. Approach
● We pose the problem of ranking weird/bizarre news as that of a multi-
class classification problem.
● Each news article is given a label depending on the weirdness of the
article.
● There are 4 classes (0-3) where 3 refers to highly weird news and 0 is
close to conventional news.
18. Evaluations and Results
Model Accuracy (Average Rank) Accuracy (Majority Rank)
lstm 0.737 0.414
bilstm 0.748 0.421
bilstm+attention 0.76 0.43
Using RNN and GloVe:
19. Evaluations and Results
Using TF-IDF Vector
Model Testing Score
Random Forest 0.69
Neural Network 0.79000002145767212
Using score as accuracy
20. Evaluations and Results
Model Testing Score
Neural Network 0.790
Logistic Regression 0.79
SVM 0.79
Random Forest 0.720
Decision Tree 0.59
Using score as accuracy
22. References
○ Chen, Yimin, Niall J. Conroy, and Victoria L.
Rubin. "Misleading online content:
Recognizing clickbait as false news."
Proceedings of the 2015 ACM on Workshop
on Multimodal Deception Detection. ACM,
2015.
○ Chakraborty, Abhijnan, et al. "Stop clickbait:
Detecting and preventing clickbaits in online
news media." Advances in Social Networks
Analysis and Mining (ASONAM), 2016
IEEE/ACM International Conference on.
IEEE, 2016.
○ Bajaj, Samir. "“The Pope Has a New Baby!”
Fake News Detection Using Deep Learning."
Links
● Github: https://github.com/satyammittal/WEIRD-NEWS
● Website: https://satyammittal.github.io/
The news with nearly same object but different actions sometimes make a difference in detecting whether it is a normal or weird news. Same thing happen with same action but different objects.
If we continue with object-action mapping, It may happen that some sentences such as title don’t have subject-predicate type of linking.
Finding out the most important features to build and train the model.
This presentation basically walk you through the roadmap and the final deliverable of our project.
Our Project is classifying news into weird and normal and predicting the weirdness level of these news.
Final Deliverable: Implementation is proving each news ranked from 1-4 as user interface and providing a title weirdness filter that will give the weirdness score(0-3) and tell whether it is weird or not