SlideShare a Scribd company logo
Resurrecting My Revolution
Using Social Link Neighborhood in Bringing Context
to the Disappearing Web
Hany SalahEldeen & Michael Nelson Resurrecting My Revolution TPDL 2013
Hany M. SalahEldeen & Michael L. Nelson
Old Dominion University
Department of Computer Science
Web Science and Digital Libraries Lab.
TPDL 2013
• From Twitter, Websites, Books:
• The Egyptian revolution.
• From Twitter Only:
• Stanford’s SNAP dataset:
• Iranian elections.
• H1N1 virus outbreak.
• Michael Jackson’s death.
• Obama’s Nobel Peace Prize.
• Twitter API:
• The Syrian uprising.
Six Socially Significant Events
Hany SalahEldeen & Michael Nelson 02 Resurrecting My Revolution TPDL 2013
Social Events Having a Bimodal Time Distribution
Hany SalahEldeen & Michael Nelson 03 Resurrecting My Revolution TPDL 2013
Resources Missing & Archived
Hany SalahEldeen & Michael Nelson 04 Resurrecting My Revolution TPDL 2013
Collection Percentage Missing Percentage Archived
23.49%H1N1 Outbreak 41.65%
36.24%Michael Jackson 39.45%
26.98%Iran 43.08%
24.59%Obama 47.87%
10.48%Egypt 20.18%
7.04%Syria 5.35%
31.62% 30.78%
24.47% 36.26%
25.64% 43.87%
26.15% 46.15%
Hany SalahEldeen & Michael Nelson 05 Resurrecting My Revolution TPDL 2013
Missing and Archived Percentages Across Time
Previous Conclusions
Hany SalahEldeen & Michael Nelson 06 Resurrecting My Revolution TPDL 2013
• Measured 21,625 resources from 6 data sets in
archives & live web.
• After a year from publishing about 11% of
content shared on social media will be gone.
• After this we are losing roughly 0.02% daily.
New Research Questions
• Validity: is our estimation model still valid?
• Existence Stability: do resources on the live web
remain missing?
• Archival Stability: do resources in public archives
persist?
• Social Context: how can we extract social context of
missing resources and potential replacements?
Hany SalahEldeen & Michael Nelson 07 Resurrecting My Revolution TPDL 2013
Revisiting Existence
• From previous study:
• Rerunning after a year:
Hany SalahEldeen & Michael Nelson 08 Resurrecting My Revolution TPDL 2013
MJ Iran H1N1 Obama Egypt Syria
Measured 37.10% 37.50% 28.17% 30.56% 26.29% 31.62% 32.47% 24.64% 7.55% 12.68%
Predicted 31.72% 31.42% 31.96% 30.98% 30.16% 29.68% 29.60% 28.36% 19.80% 11.54%
Error 5.38% 6.08% 3.79% 0.42% 3.87% 1.94% 2.87% 3.72% 12.25% 1.14%
Average Prediction Error = 4.15%
Revisiting Archival
• From previous study:
• Rerunning after a year:
Hany SalahEldeen & Michael Nelson 09 Resurrecting My Revolution TPDL 2013
MJ Iran H1N1 Obama Egypt Syria
Measured 48.61% 40.32% 60.80% 55.04% 47.97% 52.14% 48.38% 40.58% 23.73% 0.56%
Predicted 61.78% 61.18% 62.26% 60.30% 58.66% 57.70% 57.54% 55.06% 37.94% 21.42%
Error 13.17% 20.86% 1.46% 5.26% 10.69% 5.56% 9.16% 14.48% 14.21% 20.86%
Average Prediction Error = 11.57%
in all cases, our archival predictions were too optimistic
Measured Vs. Predicted
Hany SalahEldeen & Michael Nelson 10 Resurrecting My Revolution TPDL 2013
Interesting Phenomenon: Reappearance
On The Live Web And Disappearance From
The Archives
Hany SalahEldeen & Michael Nelson 11 Resurrecting My Revolution TPDL 2013
Event MJ Iran Obama H1N1 Egypt Syria Average
% Re-appearing on the web 11.29% 11.48% 6.63% 3.68% 4.21% 1.97% 6.54%
% Disappearing from archives 9.98% 11.17% 15.65% 5.46% 2.81% 2.25% 7.89%
% Going from 1 memento to 0 2.72% 2.89% 4.24% 1.96% 0.23% 0.28% 2.05%
Reappearing And Disappearance
Predictions
Hany SalahEldeen & Michael Nelson 12 Resurrecting My Revolution TPDL 2013
Tweet Existence
Hany SalahEldeen & Michael Nelson 13 Resurrecting My Revolution TPDL 2013
Problem:
We don’t have the URIs of most of the tweets
Solution:
compute loss of other tweets linking to the URI:
for each resource in the datasets:
-Extract all the tweets that link to the resource using Topsy
API (up to 500 tweets)
-Check existence of each tweet.
- (Topsy (mostly) does not delete indexed tweets)
Using Topsy API
Hany SalahEldeen & Michael Nelson 14 Resurrecting My Revolution TPDL 2013
Get all the
tweets having
the same URI
Check how many
still exist on the
live web
Tweet Existence
Hany SalahEldeen & Michael Nelson 15 Resurrecting My Revolution TPDL 2013
Event MJ Iran Obama H1N1 Egypt Syria Average
Average % of missing posts 14.43% 14.59% 10.03% 7.38% 15.08% 0.53% 10.34%
Tweets Disappearing Across Time
Hany SalahEldeen & Michael Nelson 16 Resurrecting My Revolution TPDL 2013
Context Discovery And Shared
Resource Replacement
Hany SalahEldeen & Michael Nelson 17 Resurrecting My Revolution TPDL 2013
Problem:
140 characters limits the description of the linked
resource. If it went missing, can we get the next best
thing?
Solution:
• Shared links typically have several
tweets, responses, and retweets
• We can mine these traces for context and viable
replacements
Context Discovery
Hany SalahEldeen & Michael Nelson 18 Resurrecting My Revolution TPDL 2013
Linking to:
http://beta.18daysinegypt.com/
Use Topsy to Discover Tweets
with the Same Link
Hany SalahEldeen & Michael Nelson 19 Resurrecting My Revolution TPDL 2013
Tweet Text Replacement
Hany SalahEldeen & Michael Nelson 21 Resurrecting My Revolution TPDL 2013
• From all extracted tweets, extract the best replacement tweet
having the longest common N-gram
Replace with
more descriptive
one
Resource Replacement
Hany SalahEldeen & Michael Nelson 22 Resurrecting My Revolution TPDL 2013
Assume that the resource linked in the tweet
disappeared.
We mine the list of tweets for:
• Hashtags
• User mentions
• Co-tweeted URIs
Co-Tweeted Resources
Hany SalahEldeen & Michael Nelson 23 Resurrecting My Revolution TPDL 2013
A missing resource could be described or replaced by another
resource that have been shared within the same tweet.
replaces
Co-Tweeted Resources
Hany SalahEldeen & Michael Nelson 24 Resurrecting My Revolution TPDL 2013
A missing resource could be described or replaced by another
resource that have been shared within the same tweet.
replacesOr
Build a Tweet Document
Hany SalahEldeen & Michael Nelson 20 Resurrecting My Revolution TPDL 2013
A tweet document represents the concatenation of all
extracted tweets:
do you have a story to tell about your 18 days of
revolution? share it or contact sara 18days brand new
interactive storytelling project on egyptian revolution a
very creative platform to tell your story daysinegypt
marches heading to tahrir square now from all over cairo
it's all over again use the website to document your
revolutionary stories and share them with the world! check
out awesome documentary project crowdsourcing a
people's narrative of the egyptian revolution …
”
“
Tweet Signatures
Hany SalahEldeen & Michael Nelson 28 Resurrecting My Revolution TPDL 2013
Tweet Document:
Tweet Signature = top 5 most frequent terms from Tweet Document
documentary project daysinegypt check sourced
”
“do you have a story to tell about your 18 days of
revolution? share it or contact sara 18days brand new
interactive storytelling project on egyptian revolution a
very creative platform to tell your story daysinegypt
marches heading to tahrir square now from all over cairo
it's all over again use the website to document your
revolutionary stories and share them with the world! check
out awesome documentary project crowdsourcing a
people's narrative of the egyptian revolution …
Query Google w/ Tweet Signature
Hany SalahEldeen & Michael Nelson 29 Resurrecting My Revolution TPDL 2013
Search Engine Results
Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013
The original
resource
Search Engine Results
Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013
The original
resource
Others are
good replacement
candidates
Recommendation Evaluation
Hany SalahEldeen & Michael Nelson 32 Resurrecting My Revolution TPDL 2013
We extract a dataset of resources that are currently
available:
• Pretend these resources no longer exist (for a baseline)
• Each of the resources are textual based
• Each resource have at least 30 retrievable tweets.
 Extracted 731 unique resources
Recommendation Evaluation
Hany SalahEldeen & Michael Nelson 33 Resurrecting My Revolution TPDL 2013
We use boiler plate removal library* to remove the
template from the:
•linked resources
•top 10 retrieved results from Google
We use cosine similarity to compare the documents
* https://github.com/misja/python-boilerpipe
Similarity Measures In Resource
Replacements
Hany SalahEldeen & Michael Nelson 34 Resurrecting My Revolution TPDL 2013
Results
Hany SalahEldeen & Michael Nelson 35 Resurrecting My Revolution TPDL 2013
41% of the test cases we can find a
replacement page with at least 70% similarity
to the original missing resource
The search results provide a mean reciprocal
rank of 0.43
Conclusions
Hany SalahEldeen & Michael Nelson 36 Resurrecting My Revolution TPDL 2013
• We validated our model in predicting the resource existence
on the current web with ~4% error after one year.
• The archival prediction on the other hand produced a large
error ~11.5%
• We explored a phenomenon of reappearing on the web after
disappearing (6.54%), and disappearing from the archives
too (7.89%).
• The removal of search engine caches in the most recent
Memento revision could be a possible explanation of the
disappearance from the archives.
Cont. Conclusions
Hany SalahEldeen & Michael Nelson 37 Resurrecting My Revolution TPDL 2013
• Measured the estimated percentage missing from
the tweets ~10.5%.
• Utilized Topsy API to extract context information
about tweets with missing resources.
• Investigated the viability of finding a replacement
resource to the missing one.
• In 41% of the cases we were able to extract a
replacement resource that is 70% or more similar to
the missing resource.

More Related Content

Viewers also liked

More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
Michael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
Michael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
Michael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
Yasmin AlNoamany, PhD
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
Michael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
Michael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
Michael Nelson
 

Viewers also liked (20)

More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Similar to Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context to the Disappearing Web

Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012heinestien
 
Helen Bevan - Innovation Leadership Summit 2 November
Helen Bevan - Innovation Leadership Summit 2 NovemberHelen Bevan - Innovation Leadership Summit 2 November
Helen Bevan - Innovation Leadership Summit 2 November
Innovation Agency
 
Leading for spread, adoption and large scale change
Leading for spread, adoption and large scale changeLeading for spread, adoption and large scale change
Leading for spread, adoption and large scale change
Helen Bevan
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
heinestien
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
heinestien
 
Unleashing the transformational power of staff and patients
Unleashing the transformational power of staff and patientsUnleashing the transformational power of staff and patients
Unleashing the transformational power of staff and patients
Helen Bevan
 
Rocking the boat and staying in it
Rocking the boat and staying in itRocking the boat and staying in it
Rocking the boat and staying in it
Helen Bevan
 
Leadership for the future
Leadership for the future Leadership for the future
Leadership for the future
Helen Bevan
 
Making transformation happen & creating the conditions
Making transformation happen & creating the conditionsMaking transformation happen & creating the conditions
Making transformation happen & creating the conditions
Helen Bevan
 
Leaders as change agents
Leaders as change agentsLeaders as change agents
Leaders as change agents
Marlies van Dijk
 
Championing change in a changing world
Championing change in a changing world Championing change in a changing world
Championing change in a changing world
Helen Bevan
 
Q10 How to be a brilliant change agent.
Q10   How to be a brilliant change agent.Q10   How to be a brilliant change agent.
Q10 How to be a brilliant change agent.
Helen Bevan
 
Transformational leadership: themes and trends in the wider world of change a...
Transformational leadership: themes and trends in the wider world of change a...Transformational leadership: themes and trends in the wider world of change a...
Transformational leadership: themes and trends in the wider world of change a...
Helen Bevan
 
Transformational change: it's time to rewrite the rules of change in health a...
Transformational change: it's time to rewrite the rules of change in health a...Transformational change: it's time to rewrite the rules of change in health a...
Transformational change: it's time to rewrite the rules of change in health a...
NHS Improving Quality
 
Jönköping Clinical Microsystem Hackathon: from micro to macro
Jönköping Clinical Microsystem Hackathon: from micro to macroJönköping Clinical Microsystem Hackathon: from micro to macro
Jönköping Clinical Microsystem Hackathon: from micro to macro
Helen Bevan
 
Being ready for the change that's coming
Being ready for the change that's coming Being ready for the change that's coming
Being ready for the change that's coming
Helen Bevan
 
Time to rewrite the rules of change in health and care
Time to rewrite the rules of change in health and careTime to rewrite the rules of change in health and care
Time to rewrite the rules of change in health and care
NHS Improving Quality
 
Using social media for imact, influence and spread
Using social media for imact, influence and spreadUsing social media for imact, influence and spread
Using social media for imact, influence and spread
Helen Bevan
 
Leading large scale system change
Leading large scale system change Leading large scale system change
Leading large scale system change
Helen Bevan
 
Working with people to design inclusive mental wellbeing services
Working with people to design inclusive mental wellbeing servicesWorking with people to design inclusive mental wellbeing services
Working with people to design inclusive mental wellbeing services
Livework Studio
 

Similar to Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context to the Disappearing Web (20)

Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012
 
Helen Bevan - Innovation Leadership Summit 2 November
Helen Bevan - Innovation Leadership Summit 2 NovemberHelen Bevan - Innovation Leadership Summit 2 November
Helen Bevan - Innovation Leadership Summit 2 November
 
Leading for spread, adoption and large scale change
Leading for spread, adoption and large scale changeLeading for spread, adoption and large scale change
Leading for spread, adoption and large scale change
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
 
Unleashing the transformational power of staff and patients
Unleashing the transformational power of staff and patientsUnleashing the transformational power of staff and patients
Unleashing the transformational power of staff and patients
 
Rocking the boat and staying in it
Rocking the boat and staying in itRocking the boat and staying in it
Rocking the boat and staying in it
 
Leadership for the future
Leadership for the future Leadership for the future
Leadership for the future
 
Making transformation happen & creating the conditions
Making transformation happen & creating the conditionsMaking transformation happen & creating the conditions
Making transformation happen & creating the conditions
 
Leaders as change agents
Leaders as change agentsLeaders as change agents
Leaders as change agents
 
Championing change in a changing world
Championing change in a changing world Championing change in a changing world
Championing change in a changing world
 
Q10 How to be a brilliant change agent.
Q10   How to be a brilliant change agent.Q10   How to be a brilliant change agent.
Q10 How to be a brilliant change agent.
 
Transformational leadership: themes and trends in the wider world of change a...
Transformational leadership: themes and trends in the wider world of change a...Transformational leadership: themes and trends in the wider world of change a...
Transformational leadership: themes and trends in the wider world of change a...
 
Transformational change: it's time to rewrite the rules of change in health a...
Transformational change: it's time to rewrite the rules of change in health a...Transformational change: it's time to rewrite the rules of change in health a...
Transformational change: it's time to rewrite the rules of change in health a...
 
Jönköping Clinical Microsystem Hackathon: from micro to macro
Jönköping Clinical Microsystem Hackathon: from micro to macroJönköping Clinical Microsystem Hackathon: from micro to macro
Jönköping Clinical Microsystem Hackathon: from micro to macro
 
Being ready for the change that's coming
Being ready for the change that's coming Being ready for the change that's coming
Being ready for the change that's coming
 
Time to rewrite the rules of change in health and care
Time to rewrite the rules of change in health and careTime to rewrite the rules of change in health and care
Time to rewrite the rules of change in health and care
 
Using social media for imact, influence and spread
Using social media for imact, influence and spreadUsing social media for imact, influence and spread
Using social media for imact, influence and spread
 
Leading large scale system change
Leading large scale system change Leading large scale system change
Leading large scale system change
 
Working with people to design inclusive mental wellbeing services
Working with people to design inclusive mental wellbeing servicesWorking with people to design inclusive mental wellbeing services
Working with people to design inclusive mental wellbeing services
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 

More from Michael Nelson (9)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context to the Disappearing Web

  • 1. Resurrecting My Revolution Using Social Link Neighborhood in Bringing Context to the Disappearing Web Hany SalahEldeen & Michael Nelson Resurrecting My Revolution TPDL 2013 Hany M. SalahEldeen & Michael L. Nelson Old Dominion University Department of Computer Science Web Science and Digital Libraries Lab. TPDL 2013
  • 2. • From Twitter, Websites, Books: • The Egyptian revolution. • From Twitter Only: • Stanford’s SNAP dataset: • Iranian elections. • H1N1 virus outbreak. • Michael Jackson’s death. • Obama’s Nobel Peace Prize. • Twitter API: • The Syrian uprising. Six Socially Significant Events Hany SalahEldeen & Michael Nelson 02 Resurrecting My Revolution TPDL 2013
  • 3. Social Events Having a Bimodal Time Distribution Hany SalahEldeen & Michael Nelson 03 Resurrecting My Revolution TPDL 2013
  • 4.
  • 5.
  • 6. Resources Missing & Archived Hany SalahEldeen & Michael Nelson 04 Resurrecting My Revolution TPDL 2013 Collection Percentage Missing Percentage Archived 23.49%H1N1 Outbreak 41.65% 36.24%Michael Jackson 39.45% 26.98%Iran 43.08% 24.59%Obama 47.87% 10.48%Egypt 20.18% 7.04%Syria 5.35% 31.62% 30.78% 24.47% 36.26% 25.64% 43.87% 26.15% 46.15%
  • 7. Hany SalahEldeen & Michael Nelson 05 Resurrecting My Revolution TPDL 2013 Missing and Archived Percentages Across Time
  • 8. Previous Conclusions Hany SalahEldeen & Michael Nelson 06 Resurrecting My Revolution TPDL 2013 • Measured 21,625 resources from 6 data sets in archives & live web. • After a year from publishing about 11% of content shared on social media will be gone. • After this we are losing roughly 0.02% daily.
  • 9. New Research Questions • Validity: is our estimation model still valid? • Existence Stability: do resources on the live web remain missing? • Archival Stability: do resources in public archives persist? • Social Context: how can we extract social context of missing resources and potential replacements? Hany SalahEldeen & Michael Nelson 07 Resurrecting My Revolution TPDL 2013
  • 10. Revisiting Existence • From previous study: • Rerunning after a year: Hany SalahEldeen & Michael Nelson 08 Resurrecting My Revolution TPDL 2013 MJ Iran H1N1 Obama Egypt Syria Measured 37.10% 37.50% 28.17% 30.56% 26.29% 31.62% 32.47% 24.64% 7.55% 12.68% Predicted 31.72% 31.42% 31.96% 30.98% 30.16% 29.68% 29.60% 28.36% 19.80% 11.54% Error 5.38% 6.08% 3.79% 0.42% 3.87% 1.94% 2.87% 3.72% 12.25% 1.14% Average Prediction Error = 4.15%
  • 11. Revisiting Archival • From previous study: • Rerunning after a year: Hany SalahEldeen & Michael Nelson 09 Resurrecting My Revolution TPDL 2013 MJ Iran H1N1 Obama Egypt Syria Measured 48.61% 40.32% 60.80% 55.04% 47.97% 52.14% 48.38% 40.58% 23.73% 0.56% Predicted 61.78% 61.18% 62.26% 60.30% 58.66% 57.70% 57.54% 55.06% 37.94% 21.42% Error 13.17% 20.86% 1.46% 5.26% 10.69% 5.56% 9.16% 14.48% 14.21% 20.86% Average Prediction Error = 11.57% in all cases, our archival predictions were too optimistic
  • 12. Measured Vs. Predicted Hany SalahEldeen & Michael Nelson 10 Resurrecting My Revolution TPDL 2013
  • 13. Interesting Phenomenon: Reappearance On The Live Web And Disappearance From The Archives Hany SalahEldeen & Michael Nelson 11 Resurrecting My Revolution TPDL 2013 Event MJ Iran Obama H1N1 Egypt Syria Average % Re-appearing on the web 11.29% 11.48% 6.63% 3.68% 4.21% 1.97% 6.54% % Disappearing from archives 9.98% 11.17% 15.65% 5.46% 2.81% 2.25% 7.89% % Going from 1 memento to 0 2.72% 2.89% 4.24% 1.96% 0.23% 0.28% 2.05%
  • 14. Reappearing And Disappearance Predictions Hany SalahEldeen & Michael Nelson 12 Resurrecting My Revolution TPDL 2013
  • 15. Tweet Existence Hany SalahEldeen & Michael Nelson 13 Resurrecting My Revolution TPDL 2013 Problem: We don’t have the URIs of most of the tweets Solution: compute loss of other tweets linking to the URI: for each resource in the datasets: -Extract all the tweets that link to the resource using Topsy API (up to 500 tweets) -Check existence of each tweet. - (Topsy (mostly) does not delete indexed tweets)
  • 16. Using Topsy API Hany SalahEldeen & Michael Nelson 14 Resurrecting My Revolution TPDL 2013 Get all the tweets having the same URI Check how many still exist on the live web
  • 17. Tweet Existence Hany SalahEldeen & Michael Nelson 15 Resurrecting My Revolution TPDL 2013 Event MJ Iran Obama H1N1 Egypt Syria Average Average % of missing posts 14.43% 14.59% 10.03% 7.38% 15.08% 0.53% 10.34%
  • 18. Tweets Disappearing Across Time Hany SalahEldeen & Michael Nelson 16 Resurrecting My Revolution TPDL 2013
  • 19. Context Discovery And Shared Resource Replacement Hany SalahEldeen & Michael Nelson 17 Resurrecting My Revolution TPDL 2013 Problem: 140 characters limits the description of the linked resource. If it went missing, can we get the next best thing? Solution: • Shared links typically have several tweets, responses, and retweets • We can mine these traces for context and viable replacements
  • 20. Context Discovery Hany SalahEldeen & Michael Nelson 18 Resurrecting My Revolution TPDL 2013 Linking to: http://beta.18daysinegypt.com/
  • 21. Use Topsy to Discover Tweets with the Same Link Hany SalahEldeen & Michael Nelson 19 Resurrecting My Revolution TPDL 2013
  • 22. Tweet Text Replacement Hany SalahEldeen & Michael Nelson 21 Resurrecting My Revolution TPDL 2013 • From all extracted tweets, extract the best replacement tweet having the longest common N-gram Replace with more descriptive one
  • 23. Resource Replacement Hany SalahEldeen & Michael Nelson 22 Resurrecting My Revolution TPDL 2013 Assume that the resource linked in the tweet disappeared. We mine the list of tweets for: • Hashtags • User mentions • Co-tweeted URIs
  • 24. Co-Tweeted Resources Hany SalahEldeen & Michael Nelson 23 Resurrecting My Revolution TPDL 2013 A missing resource could be described or replaced by another resource that have been shared within the same tweet. replaces
  • 25. Co-Tweeted Resources Hany SalahEldeen & Michael Nelson 24 Resurrecting My Revolution TPDL 2013 A missing resource could be described or replaced by another resource that have been shared within the same tweet. replacesOr
  • 26. Build a Tweet Document Hany SalahEldeen & Michael Nelson 20 Resurrecting My Revolution TPDL 2013 A tweet document represents the concatenation of all extracted tweets: do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairo it's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution … ” “
  • 27. Tweet Signatures Hany SalahEldeen & Michael Nelson 28 Resurrecting My Revolution TPDL 2013 Tweet Document: Tweet Signature = top 5 most frequent terms from Tweet Document documentary project daysinegypt check sourced ” “do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairo it's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution …
  • 28. Query Google w/ Tweet Signature Hany SalahEldeen & Michael Nelson 29 Resurrecting My Revolution TPDL 2013
  • 29. Search Engine Results Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013 The original resource
  • 30. Search Engine Results Hany SalahEldeen & Michael Nelson 30 Resurrecting My Revolution TPDL 2013 The original resource Others are good replacement candidates
  • 31. Recommendation Evaluation Hany SalahEldeen & Michael Nelson 32 Resurrecting My Revolution TPDL 2013 We extract a dataset of resources that are currently available: • Pretend these resources no longer exist (for a baseline) • Each of the resources are textual based • Each resource have at least 30 retrievable tweets.  Extracted 731 unique resources
  • 32. Recommendation Evaluation Hany SalahEldeen & Michael Nelson 33 Resurrecting My Revolution TPDL 2013 We use boiler plate removal library* to remove the template from the: •linked resources •top 10 retrieved results from Google We use cosine similarity to compare the documents * https://github.com/misja/python-boilerpipe
  • 33. Similarity Measures In Resource Replacements Hany SalahEldeen & Michael Nelson 34 Resurrecting My Revolution TPDL 2013
  • 34. Results Hany SalahEldeen & Michael Nelson 35 Resurrecting My Revolution TPDL 2013 41% of the test cases we can find a replacement page with at least 70% similarity to the original missing resource The search results provide a mean reciprocal rank of 0.43
  • 35. Conclusions Hany SalahEldeen & Michael Nelson 36 Resurrecting My Revolution TPDL 2013 • We validated our model in predicting the resource existence on the current web with ~4% error after one year. • The archival prediction on the other hand produced a large error ~11.5% • We explored a phenomenon of reappearing on the web after disappearing (6.54%), and disappearing from the archives too (7.89%). • The removal of search engine caches in the most recent Memento revision could be a possible explanation of the disappearance from the archives.
  • 36. Cont. Conclusions Hany SalahEldeen & Michael Nelson 37 Resurrecting My Revolution TPDL 2013 • Measured the estimated percentage missing from the tweets ~10.5%. • Utilized Topsy API to extract context information about tweets with missing resources. • Investigated the viability of finding a replacement resource to the missing one. • In 41% of the cases we were able to extract a replacement resource that is 70% or more similar to the missing resource.