Reading the Correct History? Modeling Temporal Intention in Resource Sharing

Reading the Correct History?
Modeling Temporal Intention in
Resource Sharing
Hany SalahEldeen & Michael Nelson Reading the Correct History?
Hany M. SalahEldeen & Michael L. Nelson
Old Dominion University
Department of Computer Science
Web Science and Digital Libraries Lab.

Hany SalahEldeen & Michael Nelson 1 Reading the Correct History?
• We share web pages
What I share might not be what my readers read
Possible Scenario:
• Web pages change
• Readers explore shared pages

Motivation
A temporal inconsistency can arise in
the intention of the author regarding
the state of the resource between the
tweet time and the read time…
Can we detect and model this
difference in intention?

The game plan
Problem Illustration
Training data collection attempts
The TIRM model
Ground truth validation
Data collection
Feature extraction and modeling
Model evaluation

Example: Obama’s press
conference on 14th of Jan 2013

Clicking on the link in the tweet …

Using the Twitter expanded interface
The attack on the embassy was in February
2013

Problem: There is an inconsistency
between what the tweet’s author intended
to share at time ttweet
and what the reader might actually read
upon clicking on the link at time tclick .

Implication: Since tweets are considered
the first draft of history… the historical
integrity of the tweets could be
compromised.

Solution: Detect the correct intention
Option 1 Option 2 Option 3

The game plan
Hany SalahEldeen & Michael Nelson Reading the Correct History?
The TIRM model
Data collection
Model evaluation

Amazon’s Mechanical Turk (MT)
• Crowdsourcing Internet marketplace
• Co-ordinates the use of human intelligence to
perform tasks that computers are currently unable to
do.*
* http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

Goal: Collect user intention data via MT
Hany SalahEldeen & Michael Nelson 11
Reading the Correct History?
Tweets dataset Intention Classification Tasks User Intention Data
Classifier
Train
• Problem:
– It is not as easy as it seems!

How not to classify temporal
intention 101
• Given a tweet, is the intended state of the link is
in:
past state? current state? No information?

Ground truth collection
• A dataset of 100 tweets classified by:
– Our Web Science and Digital Libraries (WS-DL)
research group members
– MT workers

The agreement was very low…
• Reliability of agreement between:
– WS-DL members = Fleiss’ ϰ = 0.14
– MT workers = Fleiss’ ϰ = 0.07
• Inter-rater agreement between the collective WS-DL
members and MT workers = Cohen’s ϰ = 0.04
 Slight agreement

So we removed the guessing part:
• The tweet is presented along with the two snapshots:
at ttweet at tclick

… and classified the 100 tweets
again
• Via a face to face meeting with WS-DL members.
• Resubmitted the new experiment to MT.

The tweet, current and past
snapshots
Past Version Current Version

The results remained very low
• For 9 MT assignments per tweet:
– If we allowed 4-5 splits we have 58% match with WS-DL.
– If we allowed 3-6 splits or better we got 31% match
 Which is worse that flipping a coin!

Observations
• Assigning a temporal intention is not
a trivial task.
• MT workers are accustomed to more
straightforward tasks.
• The concept of “time on the web” is
foreign to MT workers.

Idea: We need to transform the
problem from intention to
relevance.

Relevance tasks are simpler
• MT workers are more accustomed to classification tasks
and it requires minimum amount of explanation
Is that a cat?
- Yes
- No

Temporal Intention Relevancy Model
( TIRM)
Between ttweet and tclick:
The linked resource could have:
• Changed
• Not changed
The tweet and the linked resource could be:
• Still relevant
• No longer relevant

Resource is changed but relevant
• The resource changed
• But it is still relevant
 Intention: need the current version of the resource at any time

Relevancy and Intention Mapping
Current

Resource is changed and not relevant
 Intention: need the past version of the resource at any time
• The resource changed
• But it is no longer relevant

Past
Current

Resource is not changed and relevant
 Intention: need the past version of the resource at any time
• The resource is not changed
• And it is relevant

Past
Current
Past

Resource is not changed and not relevant
 Intention: I am not sure which version of the resource I need
• The resource is not changed
• But it is not relevant

Past
Current
Past Not Sure

Next step: validation
• MT workers ≡ judgments of the experts (WS-DL members)
✓
Is the content still relevant to the tweet?

Filtering the results
• We accepted raters with:
– At least 1000 accepted HITs
– 95% acceptance rate
• Average completion time = 61 seconds
• We removed:
– Any assignments that took <10 seconds  hasty decision
– Low quality repetitive assignments and banned the raters

Mechanical Turk Workers Vs. Experts
• For 100 tweets, WS-DL members % of agreement :
• Cohen’s ϰ = 0.854  almost perfect agreement
Agreement in three or more votes 93%
Agreement in four or more votes 80%
Agreement with all five votes 60%

The game plan
The TIRM model
Data collection
Model evaluation

Data collection
• From SNAP dataset we extracted:
– Tweets in English
– Each has an embedded URI pointing to an external resource.
– The embedded URI is shortened via Bit.ly
– The external resource:
• Still persists.
• Has at least 10 mementos.
• Is unique.
 We extracted 5,937 unique instances

Get the closest memento
…
t1 t2
tn
t4t3
Δ1 Δ2<  Pick Memento @ t1

Sorted Time Delta between tweet and closest memento
Randomly selected 1,124 instances
Time delta range: 3.07 minutes to 56.04 hours Average: 25.79 hours ~ 1 day
Tweet time
After Tweet time
Before Tweet time

Training dataset
• Rcurrent: The state of the resource at current time.
• Rclick: The state of the resource at click time.
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
5 MT workers agreeing (5-0 split) 589 52.40%
3 MT workers agreeing (3-2 close call split) 226 20.11%

The game plan
The TIRM model
Data collection
Model evaluation

Feature extraction
• For each tweet we perform:
– Link analysis
– Social Media Mining
– Archival Existence
– Sentiment Analysis
– Content Similarity
– Entity Identification

Link analysis
• Since the tweets have embedded resources shortened by
Bit.ly we can extract:
– Total number of clicks
– Hourly click logs
– Creation dates
– Referring websites
– Referring countries.
• We calculate the depth of the resource in relation to its domain
(either it is a leaf node or a root page)
– We calculated the number of backslashes in the resource’s URI

Social Media Mining
• Twitter:
– Using Topsy.com’s API to
extract:
• Total number of tweets.
• The most recent 500.
• Number of tweets by
influential users.
The collection of tweets extracted provided an extended context of the
resource authored by users in the twittersphere.

Social Media Mining
• Facebook:
– Mined too for likes, shares, posts, and clicks related to each
resource.

Archival Existence
• Using Memento Time
Maps we get:
– Total mementos
available
– Different archives count.
– The closest archived
version to the tweet
time.

Sentiment Analysis
• Using NLTK libraries of natural language text processing
• Extract the most prominent sentiment in the text

Content Similarity
• Steps:
– We download the content HTML using Lynx browser.
– We apply boilerplate removal algorithm and full text extraction.
– Calculate the cosine similarity between the two pages.
 70% similarity 

Entity Identification
• By visual inspection we observed that the majority of tweets about
celebrities are related to current events.
• We harvested Wikipedia for lists of actors, politicians, and athletes.
• Checked the existence of a celebrity mention in the tweets.
Actor: Johnny Depp

• To remove confusion we removed the close calls
 898 instances remaining
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
3 MT workers agreeing (3-2 close call split) 226 20.11%
Modeling and Classification

The trained classifier
• From the feature extraction phase we extracted 39
different features to train the classifier.
• Using 10-fold cross validation, the Cost Sensitive Classifier
Based on Random Forests gave the highest success rate =
90.32%

Testing the model
10-Fold Cross-Validation Testing
Classifier
Mean Absolute
Error
Root Mean
Squared Error
Kappa
Statistic
Incorrectly
Classified %
Correctly
Classified %
Cost sensitive
classifier based on
Random Forest
0.15 0.27 0.39 9.68% 90.32%
Classifier Precision Recall F-measure Class
Cost sensitive classifier based on
Random Forest
0.93
0.53
0.96
0.37
0.95
0.44
Relevant
Non-Relevant
Weighted Average 0.89 0.90 0.90

Feature significance
• Since we have 39 features, we needed to understand the
effect of each feature and which are the strongest ones
affecting the classification
• We applied an attribute evaluator supervised algorithm
based on Ranker search to find the strongest features

Most significant features sorted by
information gain
Rank Feature Gain Ratio
1 Existence of celebrities in tweets 0.149
2 Number of mementos 0.090
3 Tweet similarity with current page 0.071
4 Similarity: Current & past page 0.0527
5 Similarity: Tweet & past page 0.04401
6 Original URI’s depth 0.0324

Model Evaluation
• Next step was to test the trained model against other
datasets and examine the results.
• We tested against:
– The remaining 4,813 from the original 5,937 instances after extracting the
1,124 used in training.
– The Tweet Collections based on historic events. (MJ, Obama, Iran, Syria, &
H1N1)

Results of testing the model
against multiple datasets
Dataset Status 200 Status 404 of other Relevant % Non-Relevant %
Extended 4,813 instances 96.77% 3.23% 96.74% 3.26%
MJ’s Death 57.54% 42.46% 93.24% 6.76%
H1N1 Outbreak 8.96% 91.04% 97.48% 2.52%
Iran Elections 68.21% 31.79% 94.69% 5.31%
Obama’s Nobel Prize 62.86% 37.14% 93.89% 6.11%
Syrian Uprising 80.80% 19.20% 70.26% 29.75%

Idea: We need to transform the
problem from intention to
relevance.
Recap…
Now we need to transform it back!

Mapping TIRM
• We used 70% similarity as a threshold of relevancy.

Conclusions
• TIRM successfully transfers the temporal intention
problem to a temporal relevancy problem.
• Temporal relevancy is easier to solve and MT workers
provide almost perfect agreement with experts’ opinions.
• We successfully collected a gold standard dataset of
temporal user intention.
• We found a temporal inconsistency in the shared
resource ranging from <1% to 25% according to the
dataset.

Reading the Correct History? Modeling Temporal Intention in Resource Sharing

Recommended

Recommended

More Related Content

More from heinestien

More from heinestien (7)

Recently uploaded

Recently uploaded (20)

Reading the Correct History? Modeling Temporal Intention in Resource Sharing