SlideShare a Scribd company logo
1 of 224
Download to read offline
Web$Science$and$Digital$Libraries$
Research$Group$$
@WebSciDL$
Review$of$Projects$for$$
Herbert$Van$de$Sompel,$LANL$
August$5&6,$2015$
$
Corren McCoy
Disambiguation of Alumni from Publicly
Available Social Media Profiles
Presentation for Herbert Van de Sompel
08/05/2015
Let’s  be  Social!
Directory Search
Name: Michael Nelson
College: Old Dominion
Degree: Computer Science
Year: 1997
2
Motivation
Maintain
relationships
with alumni
Interact and
re-engage
3
Pew Research Survey, Sept. 2014
LinkedIn is used by 28% of online
adults. 23% are between 18-29*
Twitter is used by 23% of online
adults. 37% are between 18-29
*Pew Research Center noted a significant change in this percentage from 2013
Research Goals
• Given discrete set of attributes
• Leverage public information
• Collect structured/unstructured metadata
• Develop a probabilistic matching scheme
• Analyze and discover new profile attributes
• Connect the networks
4
Seminal Works
• Mislove, A., Viswanath, B., Gummadi, K. P., & Druschel,
P. (2010, February). You are who you know: inferring
user profiles in online social networks. In Proceedings
of the third ACM international conference on Web
search and data mining (pp. 251-260). ACM.
• Northern, C. T., & Nelson, M. L. (2011). An
unsupervised approach to discovering and
disambiguating social media profiles. In Proceedings of
Mining Data Semantics Workshop.
• Powell, J., Shankar, H., Rodriguez, M., & Van de
Sompel, H. (2014). EgoSystem: Where are our
Alumni?. Code4Lib Journal, (24).
5
Our Work is Informed
Attribute inference based on a Facebook
crawl of a known friends network with
matching to a Student or Alumni
Directory.
Examination of digital preservation
strategies across social media sites using
feature data to score and disambiguate
the discovered profiles.
Aggregation of discovered social and
institutional artifacts to a public identity
which are linked in a property graph to
facilitate searching.
Mislove
NorthernPowell
6
Similarity Metrics
Does it help to know a name?
Census Surnames Social Security Administration
Name Ranking as of 2014
Michael 7 Nelson 40
Michele ----- Weigle 13,604
First names 19,584 Surnames 150,436
8
Are Vanity Screen Names Re-used?
LinkedIn: michaellloydnelson
Twitter: phonedude_mln
9
Is the Affiliation Repeated?
LinkedIn: Old Dominion University
Twitter: Old Dominion University mentioned
in bio but could be a false positive
10
How Far Apart in Space?
LinkedIn: Norfolk, Virginia Area
Twitter: Norfolk, VA
11
Do People Re-use Profile Photos?
TinEye Reverse Image Search
12
Do Web Links Point to the Same Page?
LinkedIn: http://www.cs.odu.edu/~mln/
http://ws-dl.blogspot.com/
http://f-measure.blogspot.com/
Twitter: cs.odu.edu/~mln/
13
Community Analysis
Surrogate Connections - People Also Viewed
One step from Dr. Nelson One step from Brittany Johnson
14
Community Analysis
Disclosed – (Followers?) and Following
15
Property Graph Analysis
https://twitter.com
/phonedude_mln
https://www.linked
in.com/in/michaelll
oydnelson
16
Property Graph Analysis
https://twitter.com
/phonedude_mln
https://www.linked
in.com/in/michaelll
oydnelson
17
Location
Norfolk, Virginia area Norfolk, VA
Property Graph Analysis
https://twitter.com
/phonedude_mln
https://www.linked
in.com/in/michaelll
oydnelson
18
Location
Norfolk, Virginia area Norfolk, VA
Affiliation
Value: Old
Dominion
Attended
Property Graph Analysis
https://twitter.com
/phonedude_mln
https://www.linked
in.com/in/michaelll
oydnelson
19
Geo-
Location
Norfolk, Virginia area
Norfolk, VA
Affiliation
Value: Old
Dominion
Attended
Twitter
@ODUNow
hasOfficialAccount
Property Graph Analysis
https://twitter.com
/phonedude_mln
https://www.linked
in.com/in/michaelll
oydnelson
20
Geo-
Location
Norfolk, Virginia area
Norfolk, VA
Affiliation
Value: Old
Dominion
Attended
Twitter
@ODUNow
hasOfficialAccount
follows
Example Searches
LinkedIn Candidate Search
• Leverage  Google’s  advanced  
search operators to improve
precision.
• Trusted information from the
Registrar’s  Office.
22
LinkedIn Metadata
How Prevalent are Nicknames?
Name Michael Nelson Mike Nelson Mike Nelson
Headline Professor at Old Dominion University Orthotist / Certified Athletic Trainer
Driver at Old Dominion Freight
Line
Location Norfolk, Virginia Area Providence, Rhode Island Area Phoenix, Arizona
URL
https://www.linkedin.com/in/michaellloy
dnelson
https://www.linkedin.com/in/mikenel
son64
https://www.linkedin.com/pub/m
ike-nelson/6b/50b/879
Profile Photo
https://media.licdn.com/mpr/mpr/shrinkn
p_400_400/p/1/000/019/1d1/39275de.jp
g
https://media.licdn.com/mpr/mpr/shri
nknp_400_400/p/2/000/02f/11d/3f17
849.jpg
-----
Vanity Screen Name michaellloydnelson mikenelson64
Industry Research Hospital & Health Care Transportation/Trucking/Railroad
Websites
http://www.cs.odu.edu/~mln/
http://ws-dl.blogspot.com/
http://f-measure.blogspot.com/
----- ----
Affiliation(s) Old Dominion University, 1997-2000
Old Dominion University, 1996-1997
Virginia Polytechnic Institute and State
University, 1987-1991
Old Dominion University, 1999-2001 -----
23
Twitter Candidate Search
24
Twitter Metadata
Given and Nickname Search
User Name Michael L. Nelson Mike Nelson Mike Nelson
Bio
Head of @WebSciDL, Computer Science,
Old Dominion University; Formerly:
@NASA_Langley (1991-2002), @UNCSILS
(2000-2001);
OAI-PMH OAI-ORE Memento
ResourceSync
----- -----
Location Norfolk, VA ----- -----
URL https://twitter.com/phonedude_mln https://twitter.com/mikenelson64 -----
Profile Photo
https://pbs.twimg.com/profile_images/95
9295176/mln-ad-100x130_400x400.jpg ----- -----
Screen Name Phonedude_mln mikenelson64
Industry ----- ----- -----
Websites cs.odu.edu/~mln/ -----
Affiliation(s) Old Dominion University in bio.
Following @ODUNow official account
----- -----
25
Known Issues
• Reliability of Name Searches
– Nicknames list from the Northern (2011) study is
incomplete. Ignores ethnic given names.
– Given and surname data from US census and SSA
must exist at a certain threshold to protect privacy.
– Naïve calculation of name probabilities. Some name
combinations do not occur frequently.
• Uncovering social data is difficult
– LinkedIn limits use of API to get real connections.
– Rate limits on the Twitter API constrain the depth of
the followers/following search.
26
Known Issues
• Each network takes a different approach to
the visibility of metadata
– Exploit the structure of LinkedIn
– Twitter data is noisy, limited space with no
controlled vocabulary
27
By: Alexander Nwala
August 5, 2015
Progress Report
Presented To:
Dr. Herbert Van de Sompel, Dr. Michael Nelson
Progress Report
Outline
• Past projects
• Refactoring Hany’s Carbon date
• What Did It Look Like?
• I Can Haz Memento
• Present research
• Exploration of Distributed Information Retrieval
• Problem
• Goal
• Research paths; possibility contributions
Carbon date
• Estimates the creation date of a URI
• The current implementation features a:
• Threaded server
• Concurrent API requests
• Cached responses
• This is achieved by picking the least date
from these sources:
• Last modified date
• Bitly
• Topsy
• Backlinks
• Archives
Website: http://cd.cs.odu.edu
Blog post: http://ws-dl.blogspot.com/2014/11/2014-11-14-carbon-dating-web-version-20.html
What Did It Look Like?
• Tumblr blog which
• Uses the Memento framework to poll various public web archives
• Creates an animated image for each year that shows the progression of the site
through the years
• Everyone is free to nominate web sites to What Did It Look Like? by tweeting:
“#whatdiditlooklike URL”
Website: http://whatdiditlooklike.mementoweb.org/
Blog post: http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html
I Can Haz Memento
• Inspired by the “#icanhazpdf” movement and also built upon the Memento
framework
• For tweets with links containing “#icanhazmemento”
• I Can Haz Memento service replies the tweet with a link pointing to:
Website: https://twitter.com/icanhazmemento/
Blog post: http://ws-dl.blogspot.com/2015/07/2015-07-22-i-can-haz-memento.html
Archived
version of the
page closest
to the time of
the tweet
Progress Report
Outline
• Past projects
• Refactoring Hany’s Carbon date
• What Did It Look Like?
• I Can Haz Memento
• Present research
• Exploration of Distributed Information Retrieval
• Problem
• Goal
• Research paths; possibility contributions
Problem :: Undiscoverable resources are not included in SERPs
• SERP does not have intended resource: “A kinetic theory for age-
structured stochastic birth-death processes”
• But resource is available in a special collection (arXiv.org)
Case 1, SERP for Query: “stochastic birth-death processes”
Google
Search
arXiv.org
Search
Problem :: Information not discoverable from Google do not exist to many web users
• 1st page of SERP does not have intended resource: “EPIDEMIOLOGY THROUGH
CELLULAR…”
Case 2, SERP for Query: “influenza indonesia”
Case 2, SERP for Query: “influenza indonesia”
Google
Search
arXiv.org
Search
Relevant resource
on 7th page
Relevant resource
on 1st page
Problem :: Inconsistent views between SERP and special collections
Problem :: When to stop?
• A user potentially misses relevant information because it is NOT presented with
search results OR presented too far (e.g. last 7th page)
• In other words, if relevant content is not presented in the first n pages (e.g. n < 3),
it does not exist
? ? ?
Goal :: Present resources from multiple unindexed sources with Google SERP
• This can be achieved through middleware such as a browser plugin
10 more relevant resources1.
2. Click
Relevant resource
on 1st page
Exploration of DIR :: Problem summary and Goal
• Problem
• Inconsistent views between SERP and special collections
leads to absence of relevant resources in SERPs (Case 1)
• If relevant content is not presented in the first n pages (e.g.
n < 3), it does not exist (Case 2)
• Goal
• Present resources from multiple unindexed sources with
Google SERP
Exploration of DIR :: Possible research paths
• Research Pathway 1: Understanding the search results
• Research Pathway 2: Understanding the query
• Research Pathway 3: Understanding the data source
Research Pathway 1 vs Research Pathway 2
Research Pathway 2:
Understanding the query
• Blindly routing every query to every data
source is unacceptable
• Query understanding
• Domain classification of query
• Intent recognition of query
• Semantic labelling of query
• Route only queries that are relevant to the
data source, to the data source: e.g. a
News related query to a News source,
academic queries to academic sources
• State of the art targets building statistical
machine learning methods to solve the
query understanding problem
• Include results from data source with SERP
Research Pathway 1:
Understanding search results
• Blindly routing every query to every data
source is unacceptable
• Understand the search results for clues to
unravel nature of query
• Are Advertisements present
• Are Images present
• Are pdfs types present
• Route only queries that are relevant to the
data source, to the data source: e.g. a
News related query to a News source,
academic queries to academic sources
• State of the art doesn’t focus on search
results
• Include results from data source with SERP
Research Pathway 1: Find discriminative features for “non-scholarly materials
domain”
Query length
Permutation of Pages
Result count
Title match
Images present
HTML resource
News present
Google knowledge
entity present
Research Pathway 1: Find discriminative features for “scholarly materials domain”
Query length
Permutation of Pages
Result count
Title subset match
PDF resources
Notable Absences
• Google Knowledge
Entity
• News
• Ads
Notable Presence
• Non HTML
resources (PDF)
Research Pathway 1: What next after finding discriminative features?
• Find a dataset (Done)
• NASA NTRS query log for scholarly materials domain (400,000+)
• AOL 2006 query logs for non-scholarly materials domain (400,000+)
• Train a classify (Not done)
• Given a query and a list of search results. Classify the query as
belonging to one of multiple classes e.g. (Scholarly material)
Research Pathway 2: Heuristic for unsupervised domain classification
Original algorithm 1:
• Idea: Given a query and a list of search results, the important terms which co-
occur across multiple search results are indicative of the domain of the query.
Query 1: Search Engine URIs List
doc2
<a, a, a, b, b.., c>
doc1
2: Generate unigram vectors,
remove redundant terms
<a, c, x, y, d, d> <a, p, w, s>
docn
<a, b, c> <a, c, x, y, d> <a, p, w, s>
<a, a, a, b, c, c, d, p, s, w, x, y>
3: Sort
<a, a, a> <b> <c, c> <d> <p> <s> <w> <x> <y>
4: Find clusters
Domain Set: P
Original algorithm 1 Example: Possible domains for query “Lionel messi”
• (terms), 10 of 11 pages
• (barcelona"., barcellona-granada, barcelon,, barcelon,
barcelona), 9 of 11 pages
• (best"., best), 9 of 11 pages
• (championship, champion, championship,,
champions..., champions:, championships.,
champions', championships, championship:,
champions.", championship-winning, champions,
champions".), 9 of 11 pages
• (city, city)), 9 of 11 pages
• (club, club's, club's...), 9 of 11 pages
• (consented, considerably, consecutively).,
consecutively,, considered, consent, consistent,
conscious, consecutively"., consecutive, considers,
consider), 9 of 11 pages
• (everybody, every), 9 of 11 pages
• (fc, fc.), 9 of 11 pages
• (football, football".), 9 of 11 pages
• (game"., game".[370], game), 9 of 11 pages
Relevant domains based
on human judgement
Original algorithm 2: Heuristic for supervised domain classification
• Given a set of predefined domains D:
<a, a, a> <b> <c, c> <d> <p> <s> <w> <x> <y>
4: Find clusters
Domain set: P
…
max( similarity (Pi, Di) )
• Similarity
• Naive hybrid similarity (Jaccard/Overlap coefficient)
• Word net
• Explicit Semantic Analysis
Exploration of DIR :: Summary
• Problem
• There exists an inconsistency between between SERP and
special collections, thus many relevant resources are not
included in SERPs or
• Included too late (e.g. last page)
• Goal
• Present resources from multiple unindexed sources with Google
SERP which can be done through a browser plugin
• Research Pathways
• Understand the search result and train a model to learn when a
query should be forwarded to a special collection
• Understand the query, for example the domain, then forward
only relevant queries to their respective special collections
• Include results from special collection with SERP
TEMPORAL COHERENCE
OF COMPOSITE
MEMENTOS IN WEB
ARCHIVES
SCOTT G. AINSWORTH
OLD DOMINION UNIVERSITY
AUGUST 5, 2015
OLD DOMINION UNIVERSITY
CONTENTS
■ Motivation
(Appearances can be deceiving)
■ Background
■ Temporal Coherence
■ Research
■ What’s next?
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
2
MOTIVATION
TEMPORAL COHERENCE
OF COMPOSITE MEMENTOS
IN WEB ARCHIVES
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
3
APPEARANCES …
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
4
http://web.archive.org/web/20041209190926/http://www.wunderground.org/cgi-bin/findWeather/getForecast?query=50593 (now 404, but that's a different story…)
… CAN BE DECEIVING
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
5
Root Memento-Datetime: 2004-12-09T19:09:26
CLEAR OR CLOUDY?
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
6
QUESTIONS
■ How prevalent is temporal incoherence?
■ Can Temporal Coherence be improved using
■ Multiple archives?
■ Additional memento selection heuristics?
■ How can Temporal Coherence be conveyed?
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
7
BACKGROUND
COMPOSITE MEMENTOS
COHERENCE STATES
COHERENCE PATTERNS
TEMPORAL COHERENCE
OF COMPOSITE MEMENTOS
IN WEB ARCHIVES
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
8
COMPOSITE MEMENTO
PRESENTATION STRUCTURE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
9
URI-M0
URI-M1 URI-M2 URI-Mi-1
...
URI-Mi URI-Mi+1 URI-Mn
...
COHERENCE STATES
■ Prima Facie Coherent
Evidence that the memento existed in its
archived state when the root was acquired.
■ Prima Facie Violative
Evidence … did not exist ...
■ Possibly Coherent
Evidence … might have existed ...
■ Probably Violative
Evidence … probably did not exist ...
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
10
CONSIDER THIS HTML…
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
11
<html>
<img src="foo.jpeg">
</html>
AND THESE RESPONSE HEADERS
HTTP/1.1 200 OK
Server: Tengine/2.0.3
Date: Mon, 27 Apr 2015 22:03:32 GMT
Content-Type: image/jpeg
Content-Length: 15632
Connection: keep-alive
Memento-Datetime: Tue, 07 Feb 2006 00:58:23 GMT
Link: <Memento links deleted...>
X-Archive-Orig-server: Apache/1.3.26 (Unix)
ApacheJServ/1.1.2 PHP/4.3.4
X-Archive-Orig-etag: "4978-3d10-3e4d822e"
X-Archive-Orig-content-length: 15632
X-Archive-Orig-accept-ranges: bytes
X-Archive-Orig-date: Tue, 07 Feb 2006 00:58:20 GMT
X-Archive-Orig-content-type: image/jpeg
X-Archive-Orig-last-modified: ↩︎
Fri, 14 Feb 2003 23:56:30 GMT
X-Archive-Orig-connection: close
<other headers deleted>
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
12
PRIMA FACIE COHERENT
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
13
Bracket Pattern:
Memento-Datetime + Last-Modified
(yes, Last-Modified is sometimes wrong, but many of those cases can be detected)
PRIMA FACIE COHERENT
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
14
Equal Pattern: simultaneous capture
(with an optionally tunable “bubble of simultaneity”)
PRIMA FACIE VIOLATIVE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
15
POSSIBLY COHERENT
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
16
Closest (or only) memento captured before the
root
PROBABLY VIOLATIVE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
17
Closest (or only) memento captured after the
root but no Last-Modified (possibly indicating a
dynamically generated representations)
TEMPORAL
COHERENCE
EMBEDDED RESOURCES
REPRESENTING COHERENCE
TEMPORAL COHERENCE
OF COMPOSITE MEMENTOS
IN WEB ARCHIVES
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
18
TEMPORAL COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
19
TEMPORAL COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
20
2005-05-1
4
01:36:08
+9 days
+18 days +18 days
+7 months
+2.1 years
EMBEDDED RESOURCES
Resource Memento-Datetime Delta Resource
Memento-
Datetime
Delta
h"p://www.cs.odu.edu. 2005205214.01:36:08. spacer.gif. 2005206201.16:23:10. 18.6.d.
mm_menu.js. 2005205223.02:39:12. 9.0.d. jimcheng.gif. 2005206201.16:37:39. 18.6.d.
style.css. 2005205223.02:39:39. 9.0.d. jsmith.gif. 2005206201.16:58:50. 18.6.d.
gfx2logo2odu2crown.gif. 2005205223.02:39:39. 9.0.d. rmenu_1st_featured_alumni.png. 2005206201.21:21:45. 18.8.d.
ddmenu_ddown.js. 2005205223.02:39:43. 9.0.d. hmenu_college_...2new.png. 2005212221.20:14:25. 7.3.mo.
university.js. 2005205223.02:39:56. 9.0.d. rmenu_1st_upcoming_news.png. 2005212221.20:15:14. 7.3.mo.
rmenu_1st_about.png. 2005206201.13:40:25. 18.5.d. rmenu_1st_upcoming_events.png. 2005212221.21:01:12. 7.3.mo.
rmenu_bo"om_229.gif. 2005206201.14:07:29. 18.5.d. lmenu_1st_resources.png. 2005212228.17:47:41. 7.5.mo.
shadow2bl.gif. 2005206201.14:55:53. 18.6.d. bullet_blue_triangle.gif. 2005212228.19:43:48. 7.5.mo.
ecsbdg.jpg. 2005206201.14:56:17. 18.6.d. logo2cs.gif. 2005212228.19:54:29. 7.5.mo.
shadow2br.gif. 2005206201.15:18:18. 18.6.d. rmenu_1st_featured_student.png. 2007206212.02:36:07. 2.1.years.
gfx2btn2go2dblue.gif. 2005206201.15:34:19. 18.6.d. shadow2b.gif. 2007206221.02:35:17. 2.1.years.
shadow2tr.gif. 2005206201.15:55:57. 18.6.d. shadow2r.gif. 404.Not.Found.
header2right1.gif. 2005206201.16:06:16. 18.6.d.
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
21
Embedded Resources 26
Mean Delta 125.9 days
Standard Deviation 207.7 days
Minimum Delta 9.0 days
Maximum Delta 2.1 years
REPRESENTING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
22
REPRESENTING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
23
REPRESENTING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
24
REPRESENTING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
25
REPRESENTING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
26
THE FULL CHART
Mementos by Delta
RootMemento-Datetime
-3y -1y 0 1y 2y 3y 4y 5y 6y
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Probably Coherent
rURI-M
Probably Violative
Prima Facie Coherent Prima Vacie Violative
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
27
2005-03-10
RESEARCH
DATA SET
SAMPLING
STATISTICS
TEMPORAL COHERENCE
OF COMPOSITE MEMENTOS
IN WEB ARCHIVES
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
28
DATA SET
■ 4,000 sample URI-Rs (JCDL’11 data set)
■ Single and Multiple Archives
■ Two Heuristics:
■ Minimum distance (current default
Wayback behavior)
■ choose closest Memento-Datetime
■ Bracket (proposed here)
■ use combination of Memento-Datetime +
Last-Modified (when available)
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
29
SAMPLING & RECOMPOSITION
■ For each sample URI-R (rURI-R):
■ Download available TimeMaps
■ Download a single root Memento per
month
■ For each monthly Memento
■ Extract embedded URI-Rs (eURI-Rs)
■ Download TimeMaps for eURI-Rs
■ Download heuristically-best eURI-Ms
■ Repeat recursively
■ Run each heuristic and single-/multi-
archive combination
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
30
ROOT URI-R STATISTICS
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
31
Root URI-Rs archived 2,756 • 68.9%
In multiple archives 1,180 • 29.5%
Mean archives per URI-R 1.58
Mean mementos per URI-R 124.57
200 OK 82,425 • 93.6%
503 Service Unavailable 4,444 • 5.0%
404 Not found 583 • 0.7%
403 Forbidden 388 • 0.4%
Others 214 • 0.3%
URI-M Status
Archival Data
EMBEDDED URI-R STATISTICS
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
32
Embedded URI-Rs 1,623,127
per root URI-M 19.7
Embedded URI-Ms available 1,332,993 • 93.6%
per root URI-M 15.1
Not archived 312,641 • 83.9%
404 Not found 44,852 • 12.0%
403 Forbidden 6,116 • 1.6%
503 Service Unavailable 5,442 • 1.5%
Others 3,508 • 0.9%
URI-M Failure Reasons
Archival Data
COMPOSITE MEMENTO (ROOT)
COMPLETENESS & COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
33
Description
MinDist
Single
MinDist
Multi
Bracket
Single
Bracket
Multi
Mean Complete 76.1% 80.2% 76.2% 80.3%
Mean Missing 23.9% 19.8% 23.8% 19.7%
Completeness (and Missing)
Description
MinDist
Single
MinDist
Multi
Bracket
Single
Bracket
Multi
Mean Prima Facie Coherent 41.0% 40.9% 54.7% 54.6%
Mean Possibly Coherent 27.3% 28.7% 12.8% 14.2%
Mean Probably Violative 2.5% 5.3% 2.5% 5.3%
Mean Prima Facie Violative 5.3% 5.3% 6.2% 6.2%
Coherence
At least 5% of pages can be shown to have temporal violations!
Multiple archives: +completeness, -coherence?
EMBEDDED MEMENTO COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
34
Description
MinDist
Single
MinDist
Multi
Bracket
Single
Bracket
Multi
Prima Facie Coherent 622,565 621,447 864,736 859,625
Possibly Coherent 497,405 466,046 244,104 215,585
Probably Violative 104,376 53,734 104,339 53,694
Prima Facie Violative 100,760 103,662 114,062 117,469
Totals 1,325,106 1,244,889 1,327,241 1,246,373
Description
MinDist
Single
MinDist
Multi
Bracket
Single
Bracket
Multi
Prima Facie Coherent 47.0% 49.9% 65.2% 69.0%
Possibly Coherent 37.5% 37.4% 18.4% 17.3%
Probably Violative 7.9% 4.3% 7.9% 4.3%
Prima Facie Violative 7.6% 8.3% 8.6% 9.4%
At least 7% of embedded resources are used violatively!
WHAT’S NEXT?
EQUALITY & SIMILARITY
MINOR & MAJOR VIOLATIONS
POLICIES & HEURISTICS
CONVEYING COHERENCE
TEMPORAL COHERENCE
OF COMPOSITE MEMENTOS
IN WEB ARCHIVES
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
35
EQUALITY & SIMILARITY
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
36
Equality and similarity allow prima facie
coherence without Last-Modified
Early results: equality yields < 2% improvement
MINOR OR MAJOR VIOLATIONS?
■ This is a temporal violation. But is it
meaningful?
■ How to judge?
■ Most archives transform HTML
■ Few support export of original file
■ How to measure similarity on binary files?
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
37
POLICY & HEURISTIC TRADEOFFS
■ Speed: minimize distance
■ Completeness: query all archives
(not just top k)
■ Accuracy: maximize coherence
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
38
CONVEYING COHERENCE
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
39
How to scale to > 100
embedded mementos?
How to convey coherence
& contributing archive?
WHAT’S NEXT SUMMARY
■ Equality & Similarity
■ Significance of violation (major? minor?)
■ Policies & Heuristics
■ Conveying Coherence
8/5/2015 Scott G. Ainsworth • Status for Herbert Van de Sompel Visit
40
Progress Report
Lulwah Alkwai
Presented to:
Dr. Herbert Van de Sompel
1
Previous Work
JCDL 2015 Paper:
“How Well Are Arabic Websites Archived?”
Lulwah M. Alkwai, Michael L. Nelson, and Michele C. Weigle
We won “Best Student Paper Award”
2
2
English sports websites are more archived than Arabic
www.espn.go.com www.kooora.com
3
GeoIP only ccTLD only
Both Neither
!  News: alarabiya.net
!  ccTLD: Not Arabic (.net)
!  GeoIP: Not Arabic country (US)
!  E-Marketing: haraj.com.sa
!  ccTLD: Arabic (.sa)
!  GeoIP: Not an Arabic country (Ireland)
!  News: al-watan.com
!  ccTLD: Not Arabic (.com)
!  GeoIP: Arabic country (Qatar)
!  Educational: uoh.edu.sa
!  ccTLD: Arabic (.sa)
!  GeoIP: Arabic country (SA)
How do we classify Arabic websites?
4
Selecting seed URIs
Name Registered Year URI count
DMOZ US 1999 Dmoz.org/world/arabic 4,086
Raddadi Saudi Arabia 2000 Raddadi.com 3,271
Star28 Lebanon 2004 Star28.com 8,386
Total 15,743
•  15,092 unique seed URIs
•  11,014 URIs that existed in the live web
5
~41%
~38%
~36%
~39%
872
~8%
Language test intersection testing for Arabic language
6
Total Arabic URIs Dataset = (7,976+292,670) = 300,646
Crawling Arabic seed URIs
7
Findings
Our Arabic language dataset was not largely located in Arabic countries
"  Only 14.84% had an Arabic ccTLD
"  Only 10.53% had a GeoIP in an Arabic country
"  Popular Western domains (e.g., cnn.com, wikipedia.org) appeared in
the top 10
Arabic webpages are not particularly well archived or indexed
"  46% were not archived
"  31% were not indexed by Google
An Arabic webpage is more likely to be...
"  indexed if it is present in a directory
"  archived if it is present in DMOZ
"  archived if it has neither Arabic GeoIP nor Arabic ccTLD
For right now, if you want your Arabic language webpage to be archived, host
it outside of an Arabic country and get it listed in DMOZ
8
Youssef Eldakar
Bibliotheca Alexandrina
"  Since 2011, the BA crawls have focused on Egyptian
content
"  Seeds are manually selected
"  Future plans are to cover content related to the Arab
world
9
9
Bibliotheca Alexandrina
Current Work
Replacements for missing images
Goal:
Make contribution by finding missing images through context and
discover the replacement for the image
Example:
10
Motivation
"  D-Lib Magazine, Jan 2005:
“Transparent Format Migration of Preserved Web Content”
David S. H. Rosenthal, Thomas Lipkis, Thomas S. Robertson, and Seth Morabito
"  The main idea was to change a file format that is no longer
understandable to a new format without changing the URI
"  Can this be done for images with 404 responses?
"  We can define a new response code, location header
e.g. “210 Not Quite OK, But Close”
11
Sample log query
0.36.125.141)web.archive.org)5)[01/Jan/2011:01:30:58)+0000])"GET)
hBp://web.archive.org/web/20110101013058/hBp://
www.slaverymuseum.org/IraAtTable.jpeg)HTTP/1.1")404)2135)"hBp://
web.archive.org/web/20030413174118/www.slaverymuseum.org/
home.htm")"Mozilla/5.0)(Windows;)U;)Windows)NT)5.1;)en5US))
AppleWebKit/534.10)(KHTML,)like)Gecko))Chrome/8.0.552.224)Safari/
534.10")TCP_MISS:SOURCEHASH_PARENT/207.241.227.95)205)
12
Check full URI in the IA
>"curl"'I"http://web.archive.org/web/20110101013058/
http://www.slaverymuseum.org/IraAtTable.jpeg""
HTTP/1.1"404"Not"Found"
Server:"Tengine/2.1.0"
Date:"Tue,"04"Aug"2015"18:17:46"GMT"
Content'Type:"text/html;charset=utf'8"
Connection:"keep'alive"
set'cookie:"wayback_server=73;"Domain=archive.org;"Path=/;"
Expires=Thu,"03'Sep'15"18:17:45"GMT;"
X'Archive'Wayback'Runtime'Error:"ResourceNotInArchiveException:"
http://www.slaverymuseum.org/IraAtTable.jpeg"was"not"found"
X'Archive'Wayback'Perf:"{"IndexLoad":144,"IndexQueryTotal":
144,"RobotsFetchTotal":2,"RobotsRedis":1,"RobotsTotal":2,"Total":390}"
X'Archive'Playback:"0"
13
14
URI requested
15
Referring URI
Check full URI in the live web
"
>"curl"'I"http://www.slaverymuseum.org/
IraAtTable.jpeg"
HTTP/1.1"404"Not"Found"
Date:"Tue,"04"Aug"2015"18:15:34"GMT"
Server:"Apache"
Content'Type:"text/html;"charset=iso'8859'1"
16
Check Timetravel
17
Check domain in the live web
>"curl"'I"http://www.slaverymuseum.org"
HTTP/1.1"301"Moved"Permanantly"
Date:"Tue,"04"Aug"2015"18:26:41"GMT"
Server:"Apache"
Location:"https://vimeo.com/search?
q=slaverymuseum.org"
Content'Type:"text/plain;"charset=UTF'8"
18
Check image name in new page
"  Not found
19
Check leaf page for image name
20
"  Not found
Check domain in the IA
21
Check search engine for image surrounding
text
"  Using the “src” and saving the “alt” in HTML (alternative
information) as a back up.
e.g.
"  Image src="IraAtTable.jpeg”
"  alt="Ira)Hunter,)Jr.)and)Oni)Lasana
<img)border="0")src="IraAtTable.jpeg")width="120")height="97")
align="top")alt="Ira)Hunter,)Jr.)and)Oni)Lasana)">)
22
Searching Google for (IraAtTable.jpeg)
23
24
Found same src name and parts of the surrounding
text
http://signhom.net/professionalshub/wp-content/uploads/
sites/3/2013/11/IraAtTable.jpg
25
>"curl"–I"http://web.archive.org/web/20110101013058/
http://www.slaverymuseum.org/IraAtTable.jpeg""
210"Not"Quite"OK,"But"Close"
Date:"Wed,"05"Aug"2015"12:56:03"GMT"
Location:"http://signhom.net/professionalshub/wp'
content/uploads/sites/3/2013/11/IraAtTable.jpg"
26
New response code
Summary of approaches
"  Check full URI in the live web
"  Check full in URI the IA
"  Check full in URI the timetravel
"  Check domain in the live web
"  Check domain in IA
"  Check images in the redirected webpage
"  Check leaf pages
"  Check surrounding text in search engines
"  Compare results of different search engine using image
duplication, such as Google large-scale analysis of images:
http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-
into-neural.html
27
Other ideas
Image de-duplication
"  JCDL 2015:
“Identifying Duplicate and Contradictory Information in
Wikipedia”, by Sarah Weissman, Samet Ayhan, Joshua Bradley, Jimmy Lin
"  Can we do the same for the archives by detecting and
removing duplicate images
"  How many duplicate images?
"  Which version should be kept?
28
What has Justin been
up to, lately?
Justin F. Brunelle
Presentation for Herbert Van de Sompel
08/06/2015
A simpler time...
Mass hysteria. Human sacrifices. Dogs and
cats living together.
<iframe><script>...</script></iframe>
Missing resources (bad) and
Temporal violations (worse)
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
http://en.wikipedia.org/wiki/Main_Page January 18th, 2012
http://web.archive.org/web/20120118110520/http://en.wikipedia.org/wiki/Main_Page:
January 18th, 2012
Not all tools can crawl equally
Live Resource PhantomJS
Crawled
Heritrix Crawled,
Wayback replayed
Current
Work4ow
• Dereference URI-Rs
• Archive
• representation
• Extract embedded
• URI-Rs
• Repeat
Proposed Workflow
<script> tags alone are not indicative of a deferred
representation. JavaScript can be played back in the
archives!
Current workflow not suitable for deferred
representations
Use PhantomJS to run JavaScript, interact with the
representation
Two-tiered crawling approach to optimize
performance
<script> tags alone are not indicative of a deferred
representation. JavaScript can be played back in the
archives!
Current workflow not suitable for deferred
representations
Use PhantomJS to run JavaScript, interact with the
representation
Two-tiered crawling approach to optimize
performance
More URI-Rs in the
crawl frontier
Runs more slowly but
more deeply
Run-time & Frontier size PhantomJS vs. Heritrix
To appear: iPres2015
Constructed a classi=er for Deferred
Representations
Performance metrics of a two-tiered
crawling approach
The classi=er helps crawl deferred
representations most e>ciently
Current & Future Work

Using PhantomJS to execute actions on the client
– Pushing buttons
– Selecting drop-downs
– Archiving resulting representation changes

Represent representation state in WARCs
– Graph structure of embedded resources
– Replay in the Wayback Machine
http://ws-dl.blogspot.com/2015/06/2015-06-26-phantomjsvisualevent-or.html
Presented(by(Mat(Kelly(for(Herbert(Van(de(Sompel(
!
Web$Science$and$Digital$Libraries$Research$Lab$
Old(Dominion(University,(Norfolk,(VA(
August(6,(2015(
•  Software as a support vehicle
•  Issues investigating for PhD research topic
•  Sample access patterns mitigated by new
Memento-related entities
HVDS(PresentaFon( 2(
Building Software
as a PhD Researcher
SoGware(as(a(Support(Vehicle(
•  Purpose: capture what user sees into
WARC
– instead of delegation-by-URI
•  Barriers:
– Restrictive browser extension API (Evolved/time)
– Wheel inventing (nothing for WARCs in JS)
•  Perks:
– Seeded private web archiving research
– Exposed hard-to-archive content
Website:$hKp://warcreate.com(
Blog:$hKp://wsOdl.blogspot.com/2013/07/2013O07O10OwarcreateOandOwailOwarc.html(
•  “Glue” between institutional tools
– hard to configure and use
•  Native binaries
– difficult to maintain but novel
•  Further facilitated private web archiving
interest
Website:$hKp://matkelly.com/wail(
Blog:$hKp://wsOdl.blogspot.com/2013/07/2013O07O10OwarcreateOandOwailOwarc.html(
•  Integrates live + archived web experience
•  Become familiar with Memento
dynamics & usage patterns
•  Provide eventual hook into new entities
Website:$hKp://matkelly.com/mink(
Blog:$hKp://wsOdl.blogspot.com/2014/10/2014O10O03OintegraFngOliveOand.html(
•  Given same input (URI), tools produce
varying output
•  Experiment to measure variance
•  Identified hard-to-archive resources
•  Highlighted cutting edge browser-crawler
Website:$hKp://acid.matkelly.com(
Blog:$hKp://wsOdl.blogspot.com/2014/07/2014O07O14OarchivalOacidOtest.html(
Current Research
private(
archive(
private(
archive(
other(
private(
archive(
other(
private(
archive(
HVDS(PresentaFon( 9(
private(
archive(
private(
archive(
other(
private(
archive(
TimeMap
other(
private(
archive(
HVDS(PresentaFon( 10(
t = k! t = k-1!≠
HVDS(PresentaFon( 11(
HVDS(PresentaFon( 12(
90 DAYS AT A TIME
ONLY BACK TO ONE YEAR!
HVDS(PresentaFon( 13(
1(year(ago( 2(year(ago( 10(year(ago(
…(
180(days(ago(
TimeMap
HVDS(PresentaFon( 14(
private(
archive(
HVDS(PresentaFon( 15(
HVDS(PresentaFon( 16(
Facebook.com$replay$
What(is(expected( What(the(tools(captured(
Internet Archive
public, aggregated
Archive.today
public, aggregated
Foo Archives
public, non-aggregated
My web archive
private, non-aggregated
time →
Archives capturing
My homepage
Changes to
my homepage
HVDS(PresentaFon( 17(
Internet Archive
public, aggregated
Archive.today
public, aggregated
Foo Archives
public, non-aggregated
My web archive
private, non-aggregated
time →
Archives capturing
My homepage
Changes to
my homepage
HVDS(PresentaFon( 18(
Sample Access
Patterns
OR$
TimeMap
HVDS(PresentaFon( 20(
•  More mementos from a superset of sources
TimeMap
HVDS(PresentaFon( 21(
•  Patterns 1 and 2 are status quo
– provided by framework
•  Querying web archives currently only
considers public web content
– URI for lookup
•  Framework introduces 2 new entities
–  Memento Meta Aggregator (MMA)
–  Private Web Archive Adapter (PWAA)
HVDS(PresentaFon( 22(
•  Functional superset of (MA)
•  Can act as intermediary client to relay MA
results to ultimate user
•  Allows just-in-time (JIT) inclusion of
archives
– as specified at query time
•  Set of archives aggregated can be dynamic
– e.g., Results must not include IA
HVDS(PresentaFon( 23(
MY$CAPTURES$
MY$BANK$CAPTURES$
Various(public(web(archives(
My(web(archives(
HVDS(PresentaFon( 24(
MY$CAPTURES$
MY$BANK$CAPTURES$
100(
30(
10(
HVDS(PresentaFon( 25(
MY$CAPTURES$
MY$BANK$CAPTURES$
100(
30(
10(
HVDS(PresentaFon( 26(
MY$CAPTURES$
MY$BANK$CAPTURES$
NOT$AGGREGATED$
NOT$AGGREGATED$
100(
30(
10(
140(
HVDS(PresentaFon( 27(
HVDS(PresentaFon( 28(
HVDS(PresentaFon( 29(
Access(via(the(Meta(Aggregator(
(
MY$CAPTURES$
MY$BANK$CAPTURES$
100(
30(
10(
140(140(
HVDS(PresentaFon( 30(
MY$CAPTURES$
MY$BANK$CAPTURES$
Access(via(the(Meta(Aggregator(
…allows(our(archives(to(be(included(
100(
30(
10(
15(
140(155(
HVDS(PresentaFon(
MY$CAPTURES$
MY$BANK$CAPTURES$
100(
30(
10(
15(
140(155(
155(
155(
HVDS(PresentaFon( 32(
MY$CAPTURES$
MY$BANK$CAPTURES$
…(
Bob’s$public$
CAPTURES$
The$organizaLon’s$
public$CAPTURES$1$
The$organizaLon’s$
public$CAPTURES$2$
contains$
A$B$C$D$
Contains$
B$C$D$
Contains$
C$D$
A
B C(
D
10(
5(
15(
15(
20(
35(
35(
15(
50(
50(
HVDS(PresentaFon( 33(
•  Allow dynamic and JIT set of archives
•  Superset can be recursively constructed
•  Sets can be shared
My public captures!
can be integrated !
with public web archives’!
HVDS(PresentaFon( 34(
HVDS(PresentaFon( 35(
•  Regulates access to Private Web
Archives (PWAs)
•  Acts as token authorizer
•  With correct credentials, relays results
as if querying the PWA directly
HVDS(PresentaFon( 36(
MY$CAPTURES$
37(
MY$BANK$CAPTURES$
GET(TOKEN(for(PWA(
Key:(abcd1234(
HVDS(PresentaFon(
100(
30(
10(
3!captures!
10,000!captures!
MY$CAPTURES$
38(
MY$BANK$CAPTURES$
GET(TOKEN(for(PWA(
Key:(abcd1234(
HVDS(PresentaFon(
100(
30(
10(
3!captures!
10,000!captures!
MY$CAPTURES$
MY$BANK$CAPTURES$
ACCESS(OK(
Token:(4f33c64(
100(
30(
10(
3!captures!
10,000!captures!
HVDS(PresentaFon( 39(
MY$CAPTURES$
MY$BANK$CAPTURES$
GET(mementos(for(URI(
Token:(4f33c64(
100(
30(
10(
3!captures!
10,000!captures!
HVDS(PresentaFon( 40(
MY$CAPTURES$
MY$BANK$CAPTURES$
GET(mementos(for(URI(
Token:(4f33c64(
100(
30(
10(
3!captures!
10,000!captures!
HVDS(PresentaFon( 41(
MY$CAPTURES$
MY$BANK$CAPTURES$
Token:(4f33c64(
OK(
GET(mementos(for(URI(
GET(mementos(for(URI(
100(
30(
10(
3!captures!
10,000!captures!
HVDS(PresentaFon( 42(
MY$CAPTURES$
MY$BANK$CAPTURES$
Token:(4f33c64(OK(
Returning(mementos(
Return(mementos(
For(URI(
100(
30(
10(
3!captures!
10,000!captures!
HVDS(PresentaFon( 43(
MY$CAPTURES$
44(
MY$BANK$CAPTURES$
TimeMap
TimeMap
TimeMap
HVDS(PresentaFon(
100(
30(
10(
3!captures!
10,000!captures!
140(
10,000(
10,000(
10,143( 140!captures!
MY$CAPTURES$
45(
MY$BANK$CAPTURES$
TimeMap
TimeMap
TimeMap
HVDS(PresentaFon(
100(
30(
10(
3!captures!
10,000!captures!
10,143(
140!captures!
!!3!captures!
!!!!10,000!captures!
MY$CAPTURES$
46(
MY$BANK$CAPTURES$
TimeMap
HVDS(PresentaFon(
100(
30(
10(
3!captures!
10,000!captures!
10,143!captures!
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/
>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/
>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT";
key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
, <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/
>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
TimeMap
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/
>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/
>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT";
key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
, <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/
>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
MY$PRIVATE$FACEBOOK$CAPTURES$
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/
>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/
>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT";
key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
, <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/
>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
MY$PRIVATE$FACEBOOK$CAPTURES$
NOT RFC 5988 COMPLIANT!
...
, <http://web.archive.org/web/20150228155703/https://facebook.com/>;rel="memento";
datetime="Sat, 28 Feb 2015 15:57:03 GMT"
, <http://web.archive.org/web/20150228163939/http://www.facebook.com/
>;rel="memento";
datetime="Sat, 28 Feb 2015 16:39:39 GMT"
, <http://web.archive.org/web/20150303162841/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 03 Mar 2015 16:28:41 GMT"
, <http://users2machine.local/web/20150305000101/https://www.facebook.com/
>;rel="memento";
datetime="Thu, 05 Mar 2015 00:01:00 GMT";
key="e395935019ee467c797034ee410cc91e"
, <//wayback.archive-it.org/all/20150305215922/https://facebook.com/>;rel="memento";
datetime="Tue, 05 Mar 2015 21:59:22 GMT"
, <http://previouslyUnaggregated.org/web/20150306123457/https://www.facebook.com/
>;rel="memento"; datetime="Wed, 06 Mar 2015 12:34:57 GMT"
, <http://web.archive.org/web/20150310140721/https://www.facebook.com/
>;rel="memento";
datetime="Tue, 10 Mar 2015 14:07:21 GMT"
...
MY$PUBLIC$FACEBOOK$CAPTURES$
MY$CAPTURES$
51(
MY$BANK$CAPTURES$
GET(mementos(for(URI(
Token:(4f33c64(
GET(mementos(for(URI(
Token:(c5463b4(
GET(TOKEN(for(PWA(
Key:(2265eef3(
No/invalid!token!
returned!
Access!denied!or$
0!mementos!
HVDS(PresentaFon(
3!captures!
10,000!captures!
HVDS(PresentaFon( 52(
MY$BANK$CAPTURES$
Linda’s$Private$
Captures$
Bob’s$Private$
Captures$
GET(TOKENs(for(PWAs(
Key:(abcd1234,(Archive:(My(
Key:(cab45cbf,(Archive:(Linda$
Key:(b0b01b,(Archive:(Bob$
3!captures!
5!captures!
10!captures!
5(
3(
10(
HVDS(PresentaFon( 53(
MY$BANK$CAPTURES$
Access(OK(
Token:(7790ca(
Access(OK(
Token:(b0b01b(
ACCESS$
DENIED$
Linda’s$Private$
Captures$
Bob’s$Private$
Captures$
3!captures!
5!captures!
10!captures!
5(
3(
10(
HVDS(PresentaFon( 54(
MY$BANK$CAPTURES$
GET(mementos(for(URI(
Token:(7790ca,((Archive:(My(
Token:(null,(Archive:(Linda$
Token:(b0b01b,(Archive:(Bob$
Linda’s$Private$
Captures$
Bob’s$Private$
Captures$
3!captures!
5!captures!
10!captures!
5(
3(
10(
3(
10(
ø(13(
•  Preserve Private Web Content
HVDS(PresentaFon(
•  Simulate & Quickly Deploy
Private Web Archives
•  Interface with New Entities
Using Memento
New(SoGware:(
&(
•  Background research on state-of-the-art
•  Exploring use cases
– Both existing, anticipated, and fabricated
•  Resisting desire to code
HVDS(PresentaFon(
56(
&(
56(
•  Why?
– No means exists to integrate private and public
web archives.
•  How to Evaluate?
– Does this framework fit real world needs?
Scalable?
•  When will I know I am done?
– Any public/private web archive* can be
integrated.
*((((((((((((Ocompliant(

More Related Content

What's hot

Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!hblowers
 
Emerging Technologies in the Library
Emerging Technologies in the LibraryEmerging Technologies in the Library
Emerging Technologies in the LibrarySamantha Chada
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web ArchivesMichele Weigle
 
Just a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie BirkwoodJust a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie BirkwoodKatie Birkwood
 
Finding the Phoenix: Feathers, Flight & the Future of Libraries
Finding the Phoenix: Feathers, Flight & the Future of LibrariesFinding the Phoenix: Feathers, Flight & the Future of Libraries
Finding the Phoenix: Feathers, Flight & the Future of Librarieshblowers
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
What mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionWhat mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionAlan Levine
 
Cardiff - Web 2.0 & Library 2.0
Cardiff - Web 2.0 & Library 2.0Cardiff - Web 2.0 & Library 2.0
Cardiff - Web 2.0 & Library 2.0daveyp
 
Emerging Technologies for Libraries and Librarians, 2013
Emerging Technologies for Libraries and Librarians, 2013Emerging Technologies for Libraries and Librarians, 2013
Emerging Technologies for Libraries and Librarians, 2013Jennifer Baxmeyer
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Shots in the dark : Information Literacy in the 21st century
Shots in the dark : Information Literacy in the 21st centuryShots in the dark : Information Literacy in the 21st century
Shots in the dark : Information Literacy in the 21st centuryPeter Godwin
 
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...Lane Wilkinson
 
Web 2 For Free
Web 2 For FreeWeb 2 For Free
Web 2 For Freedavenolan
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...Buffy Hamilton
 
Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012 Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012 Robin M. Ashford, MSLIS
 
ARCLib - Web 2.0 and Library 2.0
ARCLib - Web 2.0 and Library 2.0ARCLib - Web 2.0 and Library 2.0
ARCLib - Web 2.0 and Library 2.0daveyp
 
Connected But Lonely: How Constant Connectivity Is Affecting Us
Connected But Lonely: How Constant Connectivity Is Affecting UsConnected But Lonely: How Constant Connectivity Is Affecting Us
Connected But Lonely: How Constant Connectivity Is Affecting Ushailey9
 

What's hot (20)

Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!
 
Emerging Technologies in the Library
Emerging Technologies in the LibraryEmerging Technologies in the Library
Emerging Technologies in the Library
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
 
Just a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie BirkwoodJust a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
 
Finding the Phoenix: Feathers, Flight & the Future of Libraries
Finding the Phoenix: Feathers, Flight & the Future of LibrariesFinding the Phoenix: Feathers, Flight & the Future of Libraries
Finding the Phoenix: Feathers, Flight & the Future of Libraries
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
What mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionWhat mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc version
 
Cardiff - Web 2.0 & Library 2.0
Cardiff - Web 2.0 & Library 2.0Cardiff - Web 2.0 & Library 2.0
Cardiff - Web 2.0 & Library 2.0
 
Emerging Technologies for Libraries and Librarians, 2013
Emerging Technologies for Libraries and Librarians, 2013Emerging Technologies for Libraries and Librarians, 2013
Emerging Technologies for Libraries and Librarians, 2013
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Shots in the dark : Information Literacy in the 21st century
Shots in the dark : Information Literacy in the 21st centuryShots in the dark : Information Literacy in the 21st century
Shots in the dark : Information Literacy in the 21st century
 
Info2011
Info2011Info2011
Info2011
 
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...
Skills That Transfer: Transliteracy and the Global Librarian (ACRL/NY 2011 Sy...
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
Web 2 For Free
Web 2 For FreeWeb 2 For Free
Web 2 For Free
 
Illuminating Learning Communities Through School Libraries and Makerspaces C...
Illuminating  Learning Communities Through School Libraries and MakerspacesC...Illuminating  Learning Communities Through School Libraries and MakerspacesC...
Illuminating Learning Communities Through School Libraries and Makerspaces C...
 
Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012 Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012
 
ARCLib - Web 2.0 and Library 2.0
ARCLib - Web 2.0 and Library 2.0ARCLib - Web 2.0 and Library 2.0
ARCLib - Web 2.0 and Library 2.0
 
Connected But Lonely: How Constant Connectivity Is Affecting Us
Connected But Lonely: How Constant Connectivity Is Affecting UsConnected But Lonely: How Constant Connectivity Is Affecting Us
Connected But Lonely: How Constant Connectivity Is Affecting Us
 

Viewers also liked

Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research ObjectYasmin AlNoamany, PhD
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web ArchivesMichael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionSawood Alam
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 

Viewers also liked (20)

Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 

Similar to @WebSciDL PhD Student Project Reviews August 5&6, 2015

Graph and Semantic Search
Graph and Semantic Search Graph and Semantic Search
Graph and Semantic Search notess
 
Hidden Data of Social Media Research
Hidden Data of Social Media ResearchHidden Data of Social Media Research
Hidden Data of Social Media ResearchKatrin Weller
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumKatrin Weller
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Andrew Deacon
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationLorri Mon
 
Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world CILIPScotland
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1mhayes2006
 
Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryKarissa Rae McKelvey
 
Profile Locally Network Globally
Profile Locally Network GloballyProfile Locally Network Globally
Profile Locally Network Globallyericmeeks
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!Renaine Julian
 
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBs
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBsResearch Ethics in the 2.0 Era: New Challenges for Researchers and IRBs
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBsMichael Zimmer
 
Unpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research ProcessUnpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research Processekhoogestraat
 
VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014Anup Sawant
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
 
Social Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsSocial Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsPeter Mika
 
RDMRose 2.6 Interviewing a researcher
RDMRose 2.6 Interviewing a researcherRDMRose 2.6 Interviewing a researcher
RDMRose 2.6 Interviewing a researcherRDMRose
 
Eldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel PhilippsonEldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel PhilippsonIDS Knowledge Services
 

Similar to @WebSciDL PhD Student Project Reviews August 5&6, 2015 (20)

Graph and Semantic Search
Graph and Semantic Search Graph and Semantic Search
Graph and Semantic Search
 
Hidden Data of Social Media Research
Hidden Data of Social Media ResearchHidden Data of Social Media Research
Hidden Data of Social Media Research
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposium
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world Linked Data: opening Scotland’s library content to the world
Linked Data: opening Scotland’s library content to the world
 
Introduction to information literacy part 1
Introduction to information literacy part 1Introduction to information literacy part 1
Introduction to information literacy part 1
 
Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media Observatory
 
Profile Locally Network Globally
Profile Locally Network GloballyProfile Locally Network Globally
Profile Locally Network Globally
 
Fail ir16 intro
Fail ir16 introFail ir16 intro
Fail ir16 intro
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!
 
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBs
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBsResearch Ethics in the 2.0 Era: New Challenges for Researchers and IRBs
Research Ethics in the 2.0 Era: New Challenges for Researchers and IRBs
 
Unpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research ProcessUnpacking Steps 3 to5 of The Big Six Research Process
Unpacking Steps 3 to5 of The Big Six Research Process
 
VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Social Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsSocial Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 years
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
RDMRose 2.6 Interviewing a researcher
RDMRose 2.6 Interviewing a researcherRDMRose 2.6 Interviewing a researcher
RDMRose 2.6 Interviewing a researcher
 
Eldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel PhilippsonEldis 20th Anniversary Workshop 2016: Rachel Philippson
Eldis 20th Anniversary Workshop 2016: Rachel Philippson
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?Michael Nelson
 

More from Michael Nelson (8)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Recently uploaded

Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewingbigorange77
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 

Recently uploaded (20)

Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Denver Web Design brochure for public viewing
Denver Web Design brochure for public viewingDenver Web Design brochure for public viewing
Denver Web Design brochure for public viewing
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 

@WebSciDL PhD Student Project Reviews August 5&6, 2015