SlideShare a Scribd company logo
A Research Agenda for
"Obsolete Data or Resources"

       Michael L. Nelson
       @phonedude_mln


       A Research Agenda for "Obsolete Data or Resources"
    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Biographical Side Note…




    A Research Agenda for "Obsolete Data or Resources"
 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Growing Up in Virginia…




    A Research Agenda for "Obsolete Data or Resources"
 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
First Job: NASA Langley Research Center




            A Research Agenda for "Obsolete Data or Resources"
         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
My Research Group

                                                            Get Active
Be Lazy                                                     • modify server
• lazy preservation                                         • enhance objects
• just-in-time preservation




Archive Quality                                            Better Tools
• APIs and services                                        • ajax archiving
• object quality                                           • temporal intention
                                                           • personal archiving



                  A Research Agenda for "Obsolete Data or Resources"
               Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Why Care About The Past?


From an anonymous WWW 2010 reviewer about our
Memento paper (emphasis mine):

"Is there any statistics to show that many or a good number of Web
users would like to get obsolete data or resources? "




                   A Research Agenda for "Obsolete Data or Resources"
                Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Two Common Misconceptions
          about Web Archiving

• Prior = old = obsolete = bad = contaminated
  – who cares, old versions are to be removed

• The Internet Archive has every copy of
  everything that has ever existed
   – who cares, problem solved




              A Research Agenda for "Obsolete Data or Resources"
           Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Current pages about the past
don't have the same impact as
      pages from the past




       A Research Agenda for "Obsolete Data or Resources"
    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
vs.




(thanks to Michele Weigle for the following Memento selection)
             A Research Agenda for "Obsolete Data or Resources"
          Web Archiving Cooperative Workshop, Stanford, June 29, 2012
A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
What have we, the archiving community,
            done wrong?




            A Research Agenda for "Obsolete Data or Resources"
         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Wrong Metaphor for Web Archives




         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Web Archives Are Not Destinations




This is a destination.                              This is not a destination.




                A Research Agenda for "Obsolete Data or Resources"
             Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Possible Metaphor for Web Archives?




          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Turn Archiving Into A Social Activity




                      see also: http://xkcd.com/1034/


          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Pinterest: A First Step?

http://media-cache-ec3.pinterest.com/upload/47639708527755289_AhxhItiQ_c.jpg
is a memento of:
http://3.bp.blogspot.com/_d0vByWRfhvU/S_Ygk_oX4xI/AAAAAAAACCQ/LXgC3S0KYEo/s400/_MG_8091.jpg
but there is no machine-readable indication of this relationship
repins are by-reference




                     A Research Agenda for "Obsolete Data or Resources"
                  Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Why doesn't the web have a better
        notion of time?




         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
TBL on Generic vs. Specific Resources



                                       http://www.w3.org/DesignIssues/Generic.html




           A Research Agenda for "Obsolete Data or Resources"
        Web Archiving Cooperative Workshop, Stanford, June 29, 2012
In The Beginning… there was the inode

struct stat {
    dev_t       st_dev;                 /*     ID of device containing file */
    ino_t       st_ino;                 /*     inode number */
    mode_t      st_mode;                /*     protection */
    nlink_t     st_nlink;               /*     number of hard links */
    uid_t       st_uid;                 /*     user ID of owner */
    gid_t       st_gid;                 /*     group ID of owner */
    dev_t       st_rdev;                /*     device ID (if special file) */
    off_t       st_size;                /*     total size, in bytes */
    blksize_t   st_blksize;             /*     blocksize for filesystem I/O */
    blkcnt_t    st_blocks;              /*     number of blocks allocated */
    time_t      st_atime;               /*     time of last access */
    time_t      st_mtime;               /*     time of last modification */
    time_t      st_ctime;               /*     time of last status change */
};


                     A Research Agenda for "Obsolete Data or Resources"
                  Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Limited Time Semantics…
% telnet www.digitalpreservation.gov 80
Trying 140.147.249.7...
Connected to www.digitalpreservation.gov.
Escape character is '^]'.
HEAD /images/ndiipp_header6.jpg HTTP/1.1
Host: www.digitalpreservation.gov
Connection: close

HTTP/1.1 200 OK
Date: Mon, 19 Jul 2010 21:41:04 GMT
Server: Apache
Last-Modified: Thu, 18 Jun 2009 16:25:54 GMT
ETag: "1bc861-10935-dca24880"
Accept-Ranges: bytes
Content-Length: 67893
Connection: close
Content-Type: image/jpeg

Connection closed by foreign host.
          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Time Semantics Becoming Less,
      Not More Available
 % telnet www.digitalpreservation.gov 80
 Trying 140.147.249.7...
 Connected to www.digitalpreservation.gov.
 Escape character is '^]'.
 HEAD / HTTP/1.1
 Host: www.digitalpreservation.gov
 Connection: close

 HTTP/1.1 200 OK
 Date: Mon, 19 Jul 2010 21:36:00 GMT
 Server: Apache
 Accept-Ranges: bytes
 Connection: close
 Content-Type: text/html

 Connection closed by foreign host.


          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
The Past Links to the Present…




                                                        explicit HTML link;
                                                         no HTTP links;
                                                           opaque URI




        A Research Agenda for "Obsolete Data or Resources"
     Web Archiving Cooperative Workshop, Stanford, June 29, 2012
The Past Links to the Present…

                                                                    no HTML links;
                                                                    no HTTP links;
                                                                   implicit from URI




        A Research Agenda for "Obsolete Data or Resources"
     Web Archiving Cooperative Workshop, Stanford, June 29, 2012
But the Present Does Not Link to the Past
                                                                  no hints in HTML,
                                                                    HTTP, or URI


                                                    % telnet www.digitalpreservation.gov 80
                                                    Trying 140.147.249.7...
                                                    Connected to www.digitalpreservation.gov.
                                                    Escape character is '^]'.
                                                    HEAD / HTTP/1.1
                                                    Host: www.digitalpreservation.gov
                                                    Connection: close

                                                    HTTP/1.1 200 OK
                                                    Date: Mon, 19 Jul 2010 21:36:00 GMT
                                                    Server: Apache
                                                    Accept-Ranges: bytes
                                                    Connection: close
                                                    Content-Type: text/html

                                                    Connection closed by foreign host.



             A Research Agenda for "Obsolete Data or Resources"
          Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Linking the Past and the Present

• Codify existing methods to create linkage from the
  past to the present
   – easy: an archived version knows for which URI it is an
     archived version
• Create a linkage from the present to the past
   – hard: solve with a level of indirection from present to past




                   A Research Agenda for "Obsolete Data or Resources"
                Web Archiving Cooperative Workshop, Stanford, June 29, 2012
The Web with Time Dimension added by Memento




              A Research Agenda for "Obsolete Data or Resources"
           Web Archiving Cooperative Workshop, Stanford, June 29, 2012   28
The archival record is incomplete




         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Va Tech Shooting -- Only 3 Mementos


                                                                     do you remember when
                                                                     it was thought to be a
                                                                     domestic disturbance of
                                                                     limited scope and they
                                                                     had a suspect in custody?




          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Palin Crosshairs and takebackthe20.com
                                                                                                      This website
                                                                                                      was published in
                                                                                                      fall of 2010




January 8, 2011:                                                                       later that day, takebackthe20.com
6 dead, 14 wounded                                                                     is taken offline
including a critically                                                                 (see: http://huff.to/QnHA6x -- it
injured Giffords                                                                       notes that absence of the page
                                                                                       in the Wayback Machine without
                                                                                       mention of the 6-12 month
                                                                                       quarantine)




                            A Research Agenda for "Obsolete Data or Resources"
                         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
What Was The Original Image?



                                                                  the present web mostly
                                                                  agrees, but there are
                                                                  variations on the theme…




       A Research Agenda for "Obsolete Data or Resources"
    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Timemap for takebackthe20.com
% curl http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.takebackthe20.com/
<http://mementoproxy.cs.odu.edu/aggr/timebundle/http://www.takebackthe20.com/>;rel="timebundle",
<http://www.takebackthe20.com/>;rel="original",
<http://http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.takebackthe20.com/>
 ;rel="timemap";type="application/link-format",
<http://mementoproxy.cs.odu.edu/aggr/timegate/http://www.takebackthe20.com/>;rel="timegate",
<http://api.wayback.archive.org/memento/20100925222153/http://www.takebackthe20.com/>
 ;rel="first memento";datetime="Sat, 25 Sep 2010 22:21:53 GMT",
<http://api.wayback.archive.org/memento/20100926095121/http://www.takebackthe20.com/>;rel="memento"
 ;datetime="Sun, 26 Sep 2010 09:51:21 GMT",
<http://api.wayback.archive.org/memento/20101001175313/http://www.takebackthe20.com/>;rel="memento"
 ;datetime="Fri, 01 Oct 2010 17:53:13 GMT",
[deletion of about 11 mementos]
<http://api.wayback.archive.org/memento/20101202224145/http://www.takebackthe20.com/>;rel="memento"
 ;datetime="Thu, 02 Dec 2010 22:41:45 GMT",
<http://api.wayback.archive.org/memento/20101202231759/http://www.takebackthe20.com/>;rel="memento"
 ;datetime="Thu, 02 Dec 2010 23:17:59 GMT",
<http://api.wayback.archive.org/memento/20101206123128/http://www.takebackthe20.com/>
 ;rel="last memento";datetime="Mon, 06 Dec 2010 12:31:28 GMT"


                       The last memento is about 1 month before the shooting.
                       Ironically, we can document the original image, but not
                       the post-shooting event. www.takebackthe20.com is now
                       an anti-Palin lapsed domain.

                               A Research Agenda for "Obsolete Data or Resources"
                            Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Reconciling the live web with
 what we find in the archives




       A Research Agenda for "Obsolete Data or Resources"
    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Richard Grenell Removing His Tweets




          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
2010 Archive of Grennel's Site…




        A Research Agenda for "Obsolete Data or Resources"
     Web Archiving Cooperative Workshop, Stanford, June 29, 2012
…But the 2008 Content Is Missing




         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
2008 Content on Live Site
  But Do You Trust It?




     A Research Agenda for "Obsolete Data or Resources"
  Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Sci-Fi / Alternate History




http://2012.talkingpointsmemo.com/2012/06/richard-mourdock-obamacare-youtube-accident.php

                    A Research Agenda for "Obsolete Data or Resources"
                 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Sometimes Shared Social Media Persists…




             A Research Agenda for "Obsolete Data or Resources"
          Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Social media archives is more than
 just fodder for The Daily Show…




         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
An Intact Tweet From the Egyptian Revolution




slide from
Hany SalahEldeen
                   https://twitter.com/miss_amy_qb/status/32477898581483521
                                 A Research Agenda for "Obsolete Data or Resources"
                              Web Archiving Cooperative Workshop, Stanford, June 29, 2012
These Tweets Have Lost Their Content
                           and Their Meaning




                      https://twitter.com/aishes/status/32485352102952960
                                                                                            Missing ?




slide from         https://twitter.com/omar_chaaban/status/32203697597452289
Hany SalahEldeen
                                 A Research Agenda for "Obsolete Data or Resources"
                              Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Estimating Shared Resource Loss in Social Media
               for Other Socially Significant Events




to appear in
TPDL 2012
                       A Research Agenda for "Obsolete Data or Resources"
                    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
More archives = more better




      A Research Agenda for "Obsolete Data or Resources"
   Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Wayback Machine




http://web.archive.org/web/20030129185239/http://www4.cnn.com/
http://web.archive.org/web/20030131093102/http://cnn.com/
http://web.archive.org/web/20040102095249/http://www3.cnn.com/
etc.

             A Research Agenda for "Obsolete Data or Resources"
          Web Archiving Cooperative Workshop, Stanford, June 29, 2012
URI Rewriting Makes for Nice Archives




The link to: http://i.cdn.turner.com/cnn/2009/TRAVEL/10/26/overseas.visitors.travel/c1main.liberty.gi.jpg
using Javascript is dynamically rewritten to:
http://web.archive.org/web/20091027043308/http://i.cdn.turner.com/cnn/2009/TRAVEL/10/26/overseas.visitors.travel/c1main.liberty.gi.jpg




                                      A Research Agenda for "Obsolete Data or Resources"
                                   Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Many Archives/Caches Do Not Rewrite URIs




Cached version of cnn.com (html only):
http://webcache.googleusercontent.com/search?q=cache%3Acnn.com
But images, for example, are not relative to SE cache; they're still at:
http://i2.cdn.turner.com/cnn/2010/POLITICS/09/23/un.ahmadinejad.walkouts/t1main.ahmadinejad.afp.gi.jpg

                       A Research Agenda for "Obsolete Data or Resources"
                    Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Some Web Sites Are Just "scp -r"
      (implicit archives!)




http://www.jcdl2007.org/                                   http://www.jcdl.org/archived-conf-sites/jcdl2007/




                   A Research Agenda for "Obsolete Data or Resources"
                Web Archiving Cooperative Workshop, Stanford, June 29, 2012
URI Rewriting is Great --
                           Until Something Goes Wrong…




http://web.archive.org/web/20080302121117/http://www.thecribs.com/

                                      http://web.archive.org/web/20100923232312/http://www.thecribs.com/aa/banners/itunes.gif

                                      A Research Agenda for "Obsolete Data or Resources"
                                   Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Where Else Could …/itunes.gif Be?


              Paradox: URI rewriting makes archives
              useful for interactive browsing, but it
              actively inhibits interoperability -- your
              session becomes trapped in an archive


                How can you escape the gravitational
                pull of IA's Wayback Machine and other
                large archives? You'd like to start an
                archive, but yours will never be as "good"
                as theirs…

         A Research Agenda for "Obsolete Data or Resources"
      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Long Tail of Archives




   A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
More Archives, More Mementos!




1000 URIs sampled from delicious.com; 1 dot = 1 Memento (x-axis=date of Memento,
y-axis=URI of Original Resource); sorted by URI longevity
                                                                             How Much of the Web
                  A Research Agenda for "Obsolete Data or Resources"         is Archived? JCDL 2011
               Web Archiving Cooperative Workshop, Stanford, June 29, 2012
For Some Collections, Still Too Few Mementos To Be Found…




             1000 URIs sampled from search engine result pages;
             preference for popular pages removed.
             note to self: it is better to be popular.       How Much of the Web
                    A Research Agenda for "Obsolete Data or Resources"         is Archived? JCDL 2011
                 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
More archives reduces
 archival uncertainty




    A Research Agenda for "Obsolete Data or Resources"
 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
No Uncertainty With Self-Archiving Systems
                                 foo.html has <img src=pic.gif>

    t0          t1          t2          t3           t4          t5           t6         t7
    |           |           |           |            |           |            |          |
 foo.html   foo.html                             foo.html                            foo.html

 pic.gif                                                      pic.gif     pic.gif    pic.gif




                          A Research Agenda for "Obsolete Data or Resources"
                       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
foo.html @ t4
                                foo.html has <img src=pic.gif>

   t0          t1          t2          t3           t4          t5           t6         t7
   |           |           |           |            |           |            |          |
foo.html   foo.html                             foo.html                            foo.html

pic.gif                                                      pic.gif     pic.gif    pic.gif



             GET /foo.html                               GET /pic.gif
             Accept-Datetime: t4                         Accept-Datetime: t4

             HTTP/1.1 200 OK                             HTTP/1.1 200 OK
             Memento-Datetime: t4                        Memento-Datetime: t0




                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
foo.html @ t4
                                   foo.html has <img src=pic.gif>

   t0          t1             t2          t3           t4          t5           t6         t7
   |           |              |           |            |           |            |          |
foo.html   foo.html                                foo.html                            foo.html

pic.gif                                                         pic.gif     pic.gif    pic.gif



             GET /foo.html                                  GET /pic.gif
             Accept-Datetime: t4                            Accept-Datetime: t4

             HTTP/1.1 200 OK                                HTTP/1.1 200 OK
             Memento-Datetime: t4                           Memento-Datetime: t0

                      foo.html correct                          pic.gif correct



                            A Research Agenda for "Obsolete Data or Resources"
                         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Uncertainty in Third-Party Archives
                                foo.html has <img src=pic.gif>

   t0          t1          t2          t3           t4          t5           t6         t7
   |           |           |           |            |           |            |          |
foo.html   foo.html                             foo.html                            foo.html

pic.gif                                                      pic.gif     pic.gif    pic.gif




                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Missed Updates
                                foo.html has <img src=pic.gif>

   t0           t1         t2          t3           t4          t5           t6         t7
   |            |          |           |            |           |            |          |
foo.html   foo.html                             foo.html     foo.html               foo.html

pic.gif    pic.gif                              pic.gif      pic.gif     pic.gif    pic.gif

                                       red italics = missed updates




                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
foo.html @ t4
                                foo.html has <img src=pic.gif>

   t0           t1         t2          t3           t4          t5           t6         t7
   |            |          |           |            |           |            |          |
foo.html   foo.html                             foo.html     foo.html               foo.html

pic.gif    pic.gif                              pic.gif      pic.gif     pic.gif    pic.gif



             GET /foo.html                               GET /pic.gif
             Accept-Datetime: t4                         Accept-Datetime: t4

             HTTP/1.1 200 OK                             HTTP/1.1 200 OK
             Memento-Datetime: t4                        Memento-Datetime: t0




                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
foo.html @ t4
                                   foo.html has <img src=pic.gif>

   t0           t1            t2          t3           t4          t5           t6         t7
   |            |             |           |            |           |            |          |
foo.html   foo.html                                foo.html     foo.html               foo.html

pic.gif    pic.gif                                 pic.gif      pic.gif     pic.gif    pic.gif



             GET /foo.html                                  GET /pic.gif
             Accept-Datetime: t4                            Accept-Datetime: t4

             HTTP/1.1 200 OK                                HTTP/1.1 200 OK
             Memento-Datetime: t4                           Memento-Datetime: t0

                      foo.html correct                           pic.gif incorrect
                                                                 (should be t4)


                            A Research Agenda for "Obsolete Data or Resources"
                         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
foo.html @ t4
                                   foo.html has <img src=pic.gif>

   t0           t1            t2          t3           t4          t5           t6         t7
   |            |             |           |            |           |            |          |
foo.html   foo.html                                foo.html     foo.html               foo.html

pic.gif    pic.gif                                 pic.gif      pic.gif     pic.gif    pic.gif



             GET /foo.html                                  GET /pic.gif
             Accept-Datetime: t4                            Accept-Datetime: t4

             HTTP/1.1 200 OK                                HTTP/1.1 200 OK
             Memento-Datetime: t4                           Memento-Datetime: t0

                      foo.html correct                           pic.gif incorrect
                                                                 (should be t4)
                      this combination (foo@t4, pic@t0) never existed!

                            A Research Agenda for "Obsolete Data or Resources"
                         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Decrease Uncertainty With More Observations?
                                   foo.html has <img src=pic.gif>

      t0           t1         t2          t3           t4          t5           t6         t7
      |            |          |           |            |           |            |          |
   foo.html   foo.html                             foo.html     foo.html               foo.html

   pic.gif    pic.gif                              pic.gif      pic.gif     pic.gif    pic.gif

                                          red italics = missed updates




                            A Research Agenda for "Obsolete Data or Resources"
                         Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Reaching Through Time

                                 % grep "^GET /web/20.*HTTP/1.1" cnn-ia-headers | awk -F"/" '{print $3}' | sort -u
                                 20091026133351js_
                                 20091026133356
                                 20091026133359js_        first was: 2009-10-26 13:33:51
                                 20091026133425
                                 20091026133427           root was: 2009-10-27 04:33:08
                                 20091026133430js_
                                 20091026133438           end was: 2009-10-27 22:47:45
                                 20091026133441
                                 20091026133443           root - first ~= 15 hours
                                 20091026133446
                                 20091026133448           end - first ~= 33 hours
                                 …[deletia]…
                                 20091027220018
                                 20091027220027
                                 20091027220237
                                 20091027220248
                                 20091027224745
                                 20100923125259          ???
                                 20100923125330          ???



http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html

                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
~33 Hours? How About ~8 Years?




          single archive only                                              with multiple archives


see: http://spread.cs.odu.edu/root/http%3A%252F%252Fanthraxinvestigation.com%252Findex.html/

                      A Research Agenda for "Obsolete Data or Resources"
                   Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Publishing and archiving are in a race.
        Publishing is winning.




           A Research Agenda for "Obsolete Data or Resources"
        Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Ajax = #noarchive




http://web.archive.org/web/*/http://maps.google.com/
http://web.archive.org/web/20091026210613/http://maps.google.com/
http://web.archive.org/web/20091026210613/http://maps.google.com/?output=html&oi=slow
                                             A Research Agenda for "Obsolete Data or Resources"
                                          Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Reaching Out From the Archive

                                           % grep Host: cnn-ia-headers | wc -l
                                             288
                                           % grep Host: cnn-ia-headers | grep -v archive.org | wc -l
                                             117
                                           % grep Host: cnn-ia-headers | grep -v archive.org | sort -u
                                           Host: ad.doubleclick.net
                                           Host: ads.adsonar.com
                                           Host: ads.cnn.com
                                           Host: aranet.vo.llnwd.net
                                           Host: b.scorecardresearch.com
                                           Host: bs.serving-sys.com
                                           Host: cnn.dyn.cnn.com
                                           Host: ds.serving-sys.com
                                           Host: gdyn.cnn.com
                                           Host: i.cdn.turner.com
                                           Host: i2.cdn.turner.com
                                           Host: js.adsonar.com
                                           Host: metrics.cnn.com
                                           Host: pix04.revsci.net
                                           Host: s0.2mdn.net
                                           Host: symbolcomplete.marketwatch.com
                                           Host: www.adfusion.com
http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html

                         A Research Agenda for "Obsolete Data or Resources"
                      Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Embedded Resources




29 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.youtube.com/user/wichitarecordings


                                           A Research Agenda for "Obsolete Data or Resources"
                                        Web Archiving Cooperative Workshop, Stanford, June 29, 2012
How Much Of What We Share Is Preservable?




  local copy of http://dctheatrescene.com/


                                                                                  same, but with no internet

                               A Research Agenda for "Obsolete Data or Resources"
                            Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Social Resources




http://www.flickr.com/photos/mic_n_2_sugars/84882320/
1 Memento: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.flickr.com/photos/mic_n_2_sugars/84882320/
http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg (Last-Modified: 10 Jan 2006…)
0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg

                                 A Research Agenda for "Obsolete Data or Resources"
                              Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Archiving a user experience,
  not the user experience.




      A Research Agenda for "Obsolete Data or Resources"
   Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Personalized Resources


GET / HTTP/1.1
Host: bit.ly
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=126736798.4156477295523165000.1251253806.1285119293.1285122783.59;
_bit=4c20df7a-003a5-07baf-91a08fa8;anon_u=cHN1X19jN2MwNjcxZC05MWNiLTQ3MmEtOGIxYy1hZDMyMWRlNzc1OTU=|
1284997489|06ac0cefc8ac369e0f9849b5fdfbbe8d077d0c65; user=cGhvbmVkdWRl|1284997489|
fdb7f02cacb3cb44416f54d83f3237ec0f7bd9b5; __utmz=126736798.1280940647.33.1.utmcsr=(direct)|utmccn=(direct)|
utmcmd=(none); _chartbeat2=ciuph6qrso6tn6w7; _xsrf=49bc661fc02845b3bcbe975d7c2f28de;
__utmb=126736798.3.10.1285122783; __utmc=126736798




                               A Research Agenda for "Obsolete Data or Resources"
                            Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Geolocated Resources

   % curl -I http://www.craigslist.org
   HTTP/1.1 302 Found
   Set-Cookie: cl_b=12851300231056905752;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT
   Location: http://geo.craigslist.org/

   % curl -I http://geo.craigslist.org/
   HTTP/1.1 302 Found
   Content-Type: text/html; charset=iso-8859-1
   Connection: close
   Location: http://norfolk.craigslist.org
   Date: Wed, 22 Sep 2010 04:33:56 GMT
   Set-Cookie: cl_b=12851300363085180962;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT
   Server: Apache

   % traceroute geo.craigslist.org
   traceroute to geo.craigslist.org (208.82.236.208), 64 hops max, 40 byte packets
    1 ***
    2 10.5.120.1 (10.5.120.1) 9.959 ms 23.004 ms 13.208 ms
    3 nrfksysr02-atm151208.hr.hr.cox.net (68.10.8.117) 10.056 ms 10.561 ms 19.970 ms
    4 nrfkdsrj01-ge500.0.rd.hr.cox.net (68.10.14.13) 11.142 ms 20.618 ms 10.293 ms
    5 ashbbprj02-ae4.0.rd.as.cox.net (68.1.1.232) 15.368 ms 68.854 ms 20.153 ms
    6 xe-3-0-0.cr2.dca2.above.net (64.125.26.241) 18.963 ms 23.674 ms 32.977 ms
    7 xe-2-2-0.cr2.iah1.us.above.net (64.125.30.53) 46.201 ms 56.156 ms 46.783 ms
    8 xe-1-1-0.mpr4.phx2.us.above.net (64.125.28.73) 82.616 ms 82.289 ms 84.383 ms
    9 * 64.124.178.62.allocated.above.net (64.124.178.62) 80.893 ms 78.786 ms
   10 511.ae9.ecore1p.craigslist.org (208.82.239.102) 95.958 ms 86.160 ms 90.115 ms
   11 www.craigslist.org (208.82.236.208) 80.968 ms 91.470 ms 80.110 ms



   A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Is there just a single web to archive?




           A Research Agenda for "Obsolete Data or Resources"
        Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Shadow Web: Mobile




46* Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://twitter.com/timoreilly
0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://mobile.twitter.com/timoreilly
   * = 46 mementos in 2010, 22 mementos in 2012
                                   A Research Agenda for "Obsolete Data or Resources"
                                Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Shadow Web: Mobile




17,000+ Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.cnn.com/
140+ Mementos:       http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://m.cnn.com/

                          A Research Agenda for "Obsolete Data or Resources"
                       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Shadow Web: Linked Data

                                                                            (this resource intentionally left blank)




http://en.wikipedia.org/wiki/DJ_Shadow                                     http://dbpedia.org/resource/DJ_Shadow


                                                                     Accept: text/html            Accept: application/rdf+xml




                                              http://dbpedia.org/page/DJ_Shadow                      http://dbpedia.org/data/DJ_Shadow


                      2 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/resource/DJ_Shadow
                      0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/data/DJ_Shadow
                      0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/page/DJ_Shadow

                                              A Research Agenda for "Obsolete Data or Resources"
                                           Web Archiving Cooperative Workshop, Stanford, June 29, 2012
A short wish list.




   A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Use Memento-Datetime HTTP Header


                             % curl -I https://twitter.com/machawk1/status/218015444496416768
                             HTTP/1.1 200 OK
                             Date: Fri, 29 Jun 2012 15:50:55 GMT
                             Status: 200 OK
                             X-Transaction: 9a209a8deb15f4ba
                             X-Frame-Options: SAMEORIGIN
                             ETag: "4b79affd0f77a83019f619428f4ebaa5"
                             Expires: Tue, 31 Mar 1981 05:00:00 GMT
                             Last-Modified: Fri, 29 Jun 2012 15:50:55 GMT
                             Content-Type: text/html; charset=utf-8
                             X-Runtime: 0.20638
                             Cache-Control: no-cache, no-store, must-revalidate,
                               pre-check=0, post-check=0
                             Content-Length: 80501
                             Pragma: no-cache
                             Strict-Transport-Security: max-age=631138519
                             X-MID: 75f6c6061c2be34447493adc6c33317c61740b5f
                             Set-Cookie: [cookie stuff deleted]
                             X-XSS-Protection: 1; mode=block
                             Vary: Accept-Encoding
                             Server: tfe




          A Research Agenda for "Obsolete Data or Resources"
       Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Richer APIs for Archives
                <?xml version="1.0"?>
                <URLEnvelope>
                <URL text="http://www.example.org" />
                <outlinks>
                <timestamp value="20122020202">
outlinks,       <olink>
                <href>http://www.anotherexample.org</href>
with context,   <atext>click here</atext>
                <window>In the following example click here we show you example</window>
datestamps      </olink>
                <olink>
                <href>http://www.anotherexample2.org</href>
                <atext>click here2</atext>
                <window>In the following example click 2 we show you example</window>
                </olink>
                </timestamp>
                </outlinks>                                                   possible now,      but there is
                <inlinks>                                                          a bootstrapping problem of
                <ilink>                                                            proving value to the archive
inlinks,        <href>http://www.myexample.org</href>
with context,   <atext>Good Example</atext>
                <window>In the following example click here we show you interesting
datestamps      example</window>
                <timestamp value="20122020202"/>
                </ilink>
                <ilink>
                <href>http://www.myexample.org</href>
                <atext>Good Example</atext>
                <window>In the following example click here we show you interesting
                example</window>
                <timestamp value="201240402042"/>
                </ilink>
                </inlinks>
                </URLEnvelope>



                        A Research Agenda for "Obsolete Data or Resources"
                     Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Closing thoughts.




   A Research Agenda for "Obsolete Data or Resources"
Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Open Problem: Compelling Applications

• UI / usage idioms
   – remember "lost in hyperspace"?
• What is right metaphor?                                        A lot of times, people don't know what
   – VCR controls through versions?                              they want until you show it to them.
                                                                 -- May 25, 1998)
   – "Track Changes" controls?
• Search like it's 1999
   – yeah, we all want search for the Wayback Machine, but like
     a dog chasing a truck…
• If you build the archive APIs, will the archive-based
  mashups come?


                  A Research Agenda for "Obsolete Data or Resources"
               Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Open Problem: Conceptual Gap

• What archives offer: access by URI +
  timestamp
  – "what did cnn.com look like on May 31, 2007?"
• What users want: concepts through time
  – "how has public opinion about the `affordable
    health care act' changed through time?"
     • hint: tag clouds aren't enough
     • to answer this question, you would likely have to find a
       current page that talks about the past



                 A Research Agenda for "Obsolete Data or Resources"
              Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Open Problem: Archival Authenticity

• Right now, we just implicitly trust Brewster and
  everyone at the IA
• The only reason the politicians/pundits in previous
  examples didn't cheat is because they didn't know it
  was an option
   – black hat archives
• What happens when there are multiple archives and
  they disagree?
   – spam archives?
   – "soft 401s"?
   – resolving archival disputes?
       • esp. if different archives can legitimately see different
         representations for the same URIs?

                    A Research Agenda for "Obsolete Data or Resources"
                 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Open Problem: Monetizing The Archive

• Until $ can be made, archives will labor in the
  shadows
• OTOH, without monetization archives are
  relatively free of spammers, lawyers, and
  other predators




                                Roger's Innovation Curve
                 http://en.wikipedia.org/wiki/Diffusion_of_innovations


               A Research Agenda for "Obsolete Data or Resources"
            Web Archiving Cooperative Workshop, Stanford, June 29, 2012
Five Easy Pieces




Preservation not for                                                                                  no more hoary stories
privileged priesthood                                                                                 about format obsolescence:
                                                                                                      http://blog.dshr.org/2010/09/reinforcing-my-point.html
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/




                                               archiving as branded service,
                                               not infrastructure
                                               http://blog.dshr.org/2010/06/jcdl-2010-keynote.html




Don't dessicate resources;                                                                           Endless metadata is not
leave them on the web
                                                                                                     preservation…
http://www.dlib.org/dlib/december05/12contents.html
                                                                                                     [too many to list]


                                        A Research Agenda for "Obsolete Data or Resources"
                                     Web Archiving Cooperative Workshop, Stanford, June 29, 2012

More Related Content

What's hot

WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationStefan Dietze
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxLIS EPI Meeting
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
LeeFeigenbaum
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)
Dorothea Salo
 
Info Management2.0
Info Management2.0Info Management2.0
Info Management2.0
electriclibrarian
 
Humanities Crowdsourcing on the Zooniverse Platform
Humanities Crowdsourcing on the Zooniverse PlatformHumanities Crowdsourcing on the Zooniverse Platform
Humanities Crowdsourcing on the Zooniverse Platform
UCLDH
 
library 2.0
library 2.0library 2.0
library 2.0
Lifelong Learning
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2
Lynne Thomas
 
Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?
electriclibrarian
 

What's hot (10)

WWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & EducationWWW2013 Tutorial: Linked Data & Education
WWW2013 Tutorial: Linked Data & Education
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
 
Open Sesame (and other open movements)
Open Sesame (and other open movements)Open Sesame (and other open movements)
Open Sesame (and other open movements)
 
Info Management2.0
Info Management2.0Info Management2.0
Info Management2.0
 
Wiser2009 Luis Martinez
Wiser2009 Luis MartinezWiser2009 Luis Martinez
Wiser2009 Luis Martinez
 
Humanities Crowdsourcing on the Zooniverse Platform
Humanities Crowdsourcing on the Zooniverse PlatformHumanities Crowdsourcing on the Zooniverse Platform
Humanities Crowdsourcing on the Zooniverse Platform
 
library 2.0
library 2.0library 2.0
library 2.0
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2
 
Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?
 

Viewers also liked

Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
Michael Nelson
 
Review of Web Archiving
Review of Web ArchivingReview of Web Archiving
Review of Web Archiving
Michael Nelson
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
Michael Nelson
 
The Open Archives Initiative
The Open Archives InitiativeThe Open Archives Initiative
The Open Archives Initiative
Michael Nelson
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
Michael Nelson
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
Michael Nelson
 
My Point of View: Michael L. Nelson Web Archiving Cooperative
My Point of View: Michael L. Nelson  Web Archiving CooperativeMy Point of View: Michael L. Nelson  Web Archiving Cooperative
My Point of View: Michael L. Nelson Web Archiving Cooperative
Michael Nelson
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...
Michael Nelson
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
Michael Nelson
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
Michael Nelson
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
Michael Nelson
 
Music Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMusic Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTube
Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 

Viewers also liked (16)

Tools for A Preservation Ready Web
Tools for A Preservation Ready WebTools for A Preservation Ready Web
Tools for A Preservation Ready Web
 
Review of Web Archiving
Review of Web ArchivingReview of Web Archiving
Review of Web Archiving
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
The Open Archives Initiative
The Open Archives InitiativeThe Open Archives Initiative
The Open Archives Initiative
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
My Point of View: Michael L. Nelson Web Archiving Cooperative
My Point of View: Michael L. Nelson  Web Archiving CooperativeMy Point of View: Michael L. Nelson  Web Archiving Cooperative
My Point of View: Michael L. Nelson Web Archiving Cooperative
 
Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...Using timed-release cryptography to mitigate the preservation risk of embargo...
Using timed-release cryptography to mitigate the preservation risk of embargo...
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
Music Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTubeMusic Video Redundancy and Half-Life in YouTube
Music Video Redundancy and Half-Life in YouTube
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

Similar to A Research Agenda for "Obsolete Data or Resources"

Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
Karen Estlund
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
robin fay
 
BlogMyData overview presentation
BlogMyData overview presentationBlogMyData overview presentation
BlogMyData overview presentation
jonblower
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
Aaron Collie
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
Aaron Collie
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
Robert H. McDonald
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Artefactual Systems - AtoM
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012
Anna Perricci
 
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Sarah Whitcher Kansa
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
EDINA, University of Edinburgh
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
DataDryad
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
EDINA, University of Edinburgh
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
Bernadette Hyland-Wood
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
PRELIDA Project
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
EDINA, University of Edinburgh
 
Preservation for the Next Generation
Preservation for the Next GenerationPreservation for the Next Generation
Preservation for the Next Generation
jiscpowr
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
Ken Karapetyan
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to A Research Agenda for "Obsolete Data or Resources" (20)

Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
BlogMyData overview presentation
BlogMyData overview presentationBlogMyData overview presentation
BlogMyData overview presentation
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012Activists archiving digital content created through OWS - AMIA - 2012
Activists archiving digital content created through OWS - AMIA - 2012
 
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
Reference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and RemedyReference Rot and Linked Data: Threat and Remedy
Reference Rot and Linked Data: Threat and Remedy
 
Preservation for the Next Generation
Preservation for the Next GenerationPreservation for the Next Generation
Preservation for the Next Generation
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
Michael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
Michael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
Michael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
Michael Nelson
 

More from Michael Nelson (20)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

A Research Agenda for "Obsolete Data or Resources"

  • 1. A Research Agenda for "Obsolete Data or Resources" Michael L. Nelson @phonedude_mln A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 2. Biographical Side Note… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 3. Growing Up in Virginia… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 4. First Job: NASA Langley Research Center A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 5. My Research Group Get Active Be Lazy • modify server • lazy preservation • enhance objects • just-in-time preservation Archive Quality Better Tools • APIs and services • ajax archiving • object quality • temporal intention • personal archiving A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 6. Why Care About The Past? From an anonymous WWW 2010 reviewer about our Memento paper (emphasis mine): "Is there any statistics to show that many or a good number of Web users would like to get obsolete data or resources? " A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 7. Two Common Misconceptions about Web Archiving • Prior = old = obsolete = bad = contaminated – who cares, old versions are to be removed • The Internet Archive has every copy of everything that has ever existed – who cares, problem solved A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 8. Current pages about the past don't have the same impact as pages from the past A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 9. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 10. vs. (thanks to Michele Weigle for the following Memento selection) A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 11. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 12. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 13. What have we, the archiving community, done wrong? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 14. Wrong Metaphor for Web Archives A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 15. Web Archives Are Not Destinations This is a destination. This is not a destination. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 16. Possible Metaphor for Web Archives? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 17. Turn Archiving Into A Social Activity see also: http://xkcd.com/1034/ A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 18. Pinterest: A First Step? http://media-cache-ec3.pinterest.com/upload/47639708527755289_AhxhItiQ_c.jpg is a memento of: http://3.bp.blogspot.com/_d0vByWRfhvU/S_Ygk_oX4xI/AAAAAAAACCQ/LXgC3S0KYEo/s400/_MG_8091.jpg but there is no machine-readable indication of this relationship repins are by-reference A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 19. Why doesn't the web have a better notion of time? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 20. TBL on Generic vs. Specific Resources http://www.w3.org/DesignIssues/Generic.html A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 21. In The Beginning… there was the inode struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ }; A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 22. Limited Time Semantics… % telnet www.digitalpreservation.gov 80 Trying 140.147.249.7... Connected to www.digitalpreservation.gov. Escape character is '^]'. HEAD /images/ndiipp_header6.jpg HTTP/1.1 Host: www.digitalpreservation.gov Connection: close HTTP/1.1 200 OK Date: Mon, 19 Jul 2010 21:41:04 GMT Server: Apache Last-Modified: Thu, 18 Jun 2009 16:25:54 GMT ETag: "1bc861-10935-dca24880" Accept-Ranges: bytes Content-Length: 67893 Connection: close Content-Type: image/jpeg Connection closed by foreign host. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 23. Time Semantics Becoming Less, Not More Available % telnet www.digitalpreservation.gov 80 Trying 140.147.249.7... Connected to www.digitalpreservation.gov. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.digitalpreservation.gov Connection: close HTTP/1.1 200 OK Date: Mon, 19 Jul 2010 21:36:00 GMT Server: Apache Accept-Ranges: bytes Connection: close Content-Type: text/html Connection closed by foreign host. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 24. The Past Links to the Present… explicit HTML link; no HTTP links; opaque URI A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 25. The Past Links to the Present… no HTML links; no HTTP links; implicit from URI A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 26. But the Present Does Not Link to the Past no hints in HTML, HTTP, or URI % telnet www.digitalpreservation.gov 80 Trying 140.147.249.7... Connected to www.digitalpreservation.gov. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.digitalpreservation.gov Connection: close HTTP/1.1 200 OK Date: Mon, 19 Jul 2010 21:36:00 GMT Server: Apache Accept-Ranges: bytes Connection: close Content-Type: text/html Connection closed by foreign host. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 27. Linking the Past and the Present • Codify existing methods to create linkage from the past to the present – easy: an archived version knows for which URI it is an archived version • Create a linkage from the present to the past – hard: solve with a level of indirection from present to past A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 28. The Web with Time Dimension added by Memento A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012 28
  • 29. The archival record is incomplete A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 30. Va Tech Shooting -- Only 3 Mementos do you remember when it was thought to be a domestic disturbance of limited scope and they had a suspect in custody? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 31. Palin Crosshairs and takebackthe20.com This website was published in fall of 2010 January 8, 2011: later that day, takebackthe20.com 6 dead, 14 wounded is taken offline including a critically (see: http://huff.to/QnHA6x -- it injured Giffords notes that absence of the page in the Wayback Machine without mention of the 6-12 month quarantine) A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 32. What Was The Original Image? the present web mostly agrees, but there are variations on the theme… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 33. Timemap for takebackthe20.com % curl http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.takebackthe20.com/ <http://mementoproxy.cs.odu.edu/aggr/timebundle/http://www.takebackthe20.com/>;rel="timebundle", <http://www.takebackthe20.com/>;rel="original", <http://http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.takebackthe20.com/> ;rel="timemap";type="application/link-format", <http://mementoproxy.cs.odu.edu/aggr/timegate/http://www.takebackthe20.com/>;rel="timegate", <http://api.wayback.archive.org/memento/20100925222153/http://www.takebackthe20.com/> ;rel="first memento";datetime="Sat, 25 Sep 2010 22:21:53 GMT", <http://api.wayback.archive.org/memento/20100926095121/http://www.takebackthe20.com/>;rel="memento" ;datetime="Sun, 26 Sep 2010 09:51:21 GMT", <http://api.wayback.archive.org/memento/20101001175313/http://www.takebackthe20.com/>;rel="memento" ;datetime="Fri, 01 Oct 2010 17:53:13 GMT", [deletion of about 11 mementos] <http://api.wayback.archive.org/memento/20101202224145/http://www.takebackthe20.com/>;rel="memento" ;datetime="Thu, 02 Dec 2010 22:41:45 GMT", <http://api.wayback.archive.org/memento/20101202231759/http://www.takebackthe20.com/>;rel="memento" ;datetime="Thu, 02 Dec 2010 23:17:59 GMT", <http://api.wayback.archive.org/memento/20101206123128/http://www.takebackthe20.com/> ;rel="last memento";datetime="Mon, 06 Dec 2010 12:31:28 GMT" The last memento is about 1 month before the shooting. Ironically, we can document the original image, but not the post-shooting event. www.takebackthe20.com is now an anti-Palin lapsed domain. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 34. Reconciling the live web with what we find in the archives A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 35. Richard Grenell Removing His Tweets A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 36. 2010 Archive of Grennel's Site… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 37. …But the 2008 Content Is Missing A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 38. 2008 Content on Live Site But Do You Trust It? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 39. Sci-Fi / Alternate History http://2012.talkingpointsmemo.com/2012/06/richard-mourdock-obamacare-youtube-accident.php A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 40. Sometimes Shared Social Media Persists… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 41. Social media archives is more than just fodder for The Daily Show… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 42. An Intact Tweet From the Egyptian Revolution slide from Hany SalahEldeen https://twitter.com/miss_amy_qb/status/32477898581483521 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 43. These Tweets Have Lost Their Content and Their Meaning https://twitter.com/aishes/status/32485352102952960 Missing ? slide from https://twitter.com/omar_chaaban/status/32203697597452289 Hany SalahEldeen A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 44. Estimating Shared Resource Loss in Social Media for Other Socially Significant Events to appear in TPDL 2012 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 45. More archives = more better A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 47. URI Rewriting Makes for Nice Archives The link to: http://i.cdn.turner.com/cnn/2009/TRAVEL/10/26/overseas.visitors.travel/c1main.liberty.gi.jpg using Javascript is dynamically rewritten to: http://web.archive.org/web/20091027043308/http://i.cdn.turner.com/cnn/2009/TRAVEL/10/26/overseas.visitors.travel/c1main.liberty.gi.jpg A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 48. Many Archives/Caches Do Not Rewrite URIs Cached version of cnn.com (html only): http://webcache.googleusercontent.com/search?q=cache%3Acnn.com But images, for example, are not relative to SE cache; they're still at: http://i2.cdn.turner.com/cnn/2010/POLITICS/09/23/un.ahmadinejad.walkouts/t1main.ahmadinejad.afp.gi.jpg A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 49. Some Web Sites Are Just "scp -r" (implicit archives!) http://www.jcdl2007.org/ http://www.jcdl.org/archived-conf-sites/jcdl2007/ A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 50. URI Rewriting is Great -- Until Something Goes Wrong… http://web.archive.org/web/20080302121117/http://www.thecribs.com/ http://web.archive.org/web/20100923232312/http://www.thecribs.com/aa/banners/itunes.gif A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 51. Where Else Could …/itunes.gif Be? Paradox: URI rewriting makes archives useful for interactive browsing, but it actively inhibits interoperability -- your session becomes trapped in an archive How can you escape the gravitational pull of IA's Wayback Machine and other large archives? You'd like to start an archive, but yours will never be as "good" as theirs… A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 52. Long Tail of Archives A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 53. More Archives, More Mementos! 1000 URIs sampled from delicious.com; 1 dot = 1 Memento (x-axis=date of Memento, y-axis=URI of Original Resource); sorted by URI longevity How Much of the Web A Research Agenda for "Obsolete Data or Resources" is Archived? JCDL 2011 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 54. For Some Collections, Still Too Few Mementos To Be Found… 1000 URIs sampled from search engine result pages; preference for popular pages removed. note to self: it is better to be popular. How Much of the Web A Research Agenda for "Obsolete Data or Resources" is Archived? JCDL 2011 Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 55. More archives reduces archival uncertainty A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 56. No Uncertainty With Self-Archiving Systems foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 57. foo.html @ t4 foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif GET /foo.html GET /pic.gif Accept-Datetime: t4 Accept-Datetime: t4 HTTP/1.1 200 OK HTTP/1.1 200 OK Memento-Datetime: t4 Memento-Datetime: t0 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 58. foo.html @ t4 foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif GET /foo.html GET /pic.gif Accept-Datetime: t4 Accept-Datetime: t4 HTTP/1.1 200 OK HTTP/1.1 200 OK Memento-Datetime: t4 Memento-Datetime: t0 foo.html correct pic.gif correct A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 59. Uncertainty in Third-Party Archives foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 60. Missed Updates foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif pic.gif pic.gif red italics = missed updates A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 61. foo.html @ t4 foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif pic.gif pic.gif GET /foo.html GET /pic.gif Accept-Datetime: t4 Accept-Datetime: t4 HTTP/1.1 200 OK HTTP/1.1 200 OK Memento-Datetime: t4 Memento-Datetime: t0 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 62. foo.html @ t4 foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif pic.gif pic.gif GET /foo.html GET /pic.gif Accept-Datetime: t4 Accept-Datetime: t4 HTTP/1.1 200 OK HTTP/1.1 200 OK Memento-Datetime: t4 Memento-Datetime: t0 foo.html correct pic.gif incorrect (should be t4) A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 63. foo.html @ t4 foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif pic.gif pic.gif GET /foo.html GET /pic.gif Accept-Datetime: t4 Accept-Datetime: t4 HTTP/1.1 200 OK HTTP/1.1 200 OK Memento-Datetime: t4 Memento-Datetime: t0 foo.html correct pic.gif incorrect (should be t4) this combination (foo@t4, pic@t0) never existed! A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 64. Decrease Uncertainty With More Observations? foo.html has <img src=pic.gif> t0 t1 t2 t3 t4 t5 t6 t7 | | | | | | | | foo.html foo.html foo.html foo.html foo.html pic.gif pic.gif pic.gif pic.gif pic.gif pic.gif red italics = missed updates A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 65. Reaching Through Time % grep "^GET /web/20.*HTTP/1.1" cnn-ia-headers | awk -F"/" '{print $3}' | sort -u 20091026133351js_ 20091026133356 20091026133359js_ first was: 2009-10-26 13:33:51 20091026133425 20091026133427 root was: 2009-10-27 04:33:08 20091026133430js_ 20091026133438 end was: 2009-10-27 22:47:45 20091026133441 20091026133443 root - first ~= 15 hours 20091026133446 20091026133448 end - first ~= 33 hours …[deletia]… 20091027220018 20091027220027 20091027220237 20091027220248 20091027224745 20100923125259 ??? 20100923125330 ??? http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 66. ~33 Hours? How About ~8 Years? single archive only with multiple archives see: http://spread.cs.odu.edu/root/http%3A%252F%252Fanthraxinvestigation.com%252Findex.html/ A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 67. Publishing and archiving are in a race. Publishing is winning. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 69. Reaching Out From the Archive % grep Host: cnn-ia-headers | wc -l 288 % grep Host: cnn-ia-headers | grep -v archive.org | wc -l 117 % grep Host: cnn-ia-headers | grep -v archive.org | sort -u Host: ad.doubleclick.net Host: ads.adsonar.com Host: ads.cnn.com Host: aranet.vo.llnwd.net Host: b.scorecardresearch.com Host: bs.serving-sys.com Host: cnn.dyn.cnn.com Host: ds.serving-sys.com Host: gdyn.cnn.com Host: i.cdn.turner.com Host: i2.cdn.turner.com Host: js.adsonar.com Host: metrics.cnn.com Host: pix04.revsci.net Host: s0.2mdn.net Host: symbolcomplete.marketwatch.com Host: www.adfusion.com http://web.archive.org/web/20091027043308/http://www.cnn.com/index.html A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 70. Embedded Resources 29 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.youtube.com/user/wichitarecordings A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 71. How Much Of What We Share Is Preservable? local copy of http://dctheatrescene.com/ same, but with no internet A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 72. Social Resources http://www.flickr.com/photos/mic_n_2_sugars/84882320/ 1 Memento: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.flickr.com/photos/mic_n_2_sugars/84882320/ http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg (Last-Modified: 10 Jan 2006…) 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://farm1.static.flickr.com/37/84882320_67fc8915d5_z.jpg A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 73. Archiving a user experience, not the user experience. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 74. Personalized Resources GET / HTTP/1.1 Host: bit.ly User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 115 Connection: keep-alive Cookie: __utma=126736798.4156477295523165000.1251253806.1285119293.1285122783.59; _bit=4c20df7a-003a5-07baf-91a08fa8;anon_u=cHN1X19jN2MwNjcxZC05MWNiLTQ3MmEtOGIxYy1hZDMyMWRlNzc1OTU=| 1284997489|06ac0cefc8ac369e0f9849b5fdfbbe8d077d0c65; user=cGhvbmVkdWRl|1284997489| fdb7f02cacb3cb44416f54d83f3237ec0f7bd9b5; __utmz=126736798.1280940647.33.1.utmcsr=(direct)|utmccn=(direct)| utmcmd=(none); _chartbeat2=ciuph6qrso6tn6w7; _xsrf=49bc661fc02845b3bcbe975d7c2f28de; __utmb=126736798.3.10.1285122783; __utmc=126736798 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 75. Geolocated Resources % curl -I http://www.craigslist.org HTTP/1.1 302 Found Set-Cookie: cl_b=12851300231056905752;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT Location: http://geo.craigslist.org/ % curl -I http://geo.craigslist.org/ HTTP/1.1 302 Found Content-Type: text/html; charset=iso-8859-1 Connection: close Location: http://norfolk.craigslist.org Date: Wed, 22 Sep 2010 04:33:56 GMT Set-Cookie: cl_b=12851300363085180962;path=/;domain=.craigslist.org;expires=01 Jan 2038 00:00:00 GMT Server: Apache % traceroute geo.craigslist.org traceroute to geo.craigslist.org (208.82.236.208), 64 hops max, 40 byte packets 1 *** 2 10.5.120.1 (10.5.120.1) 9.959 ms 23.004 ms 13.208 ms 3 nrfksysr02-atm151208.hr.hr.cox.net (68.10.8.117) 10.056 ms 10.561 ms 19.970 ms 4 nrfkdsrj01-ge500.0.rd.hr.cox.net (68.10.14.13) 11.142 ms 20.618 ms 10.293 ms 5 ashbbprj02-ae4.0.rd.as.cox.net (68.1.1.232) 15.368 ms 68.854 ms 20.153 ms 6 xe-3-0-0.cr2.dca2.above.net (64.125.26.241) 18.963 ms 23.674 ms 32.977 ms 7 xe-2-2-0.cr2.iah1.us.above.net (64.125.30.53) 46.201 ms 56.156 ms 46.783 ms 8 xe-1-1-0.mpr4.phx2.us.above.net (64.125.28.73) 82.616 ms 82.289 ms 84.383 ms 9 * 64.124.178.62.allocated.above.net (64.124.178.62) 80.893 ms 78.786 ms 10 511.ae9.ecore1p.craigslist.org (208.82.239.102) 95.958 ms 86.160 ms 90.115 ms 11 www.craigslist.org (208.82.236.208) 80.968 ms 91.470 ms 80.110 ms A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 76. Is there just a single web to archive? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 77. Shadow Web: Mobile 46* Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://twitter.com/timoreilly 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://mobile.twitter.com/timoreilly * = 46 mementos in 2010, 22 mementos in 2012 A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 78. Shadow Web: Mobile 17,000+ Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.cnn.com/ 140+ Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://m.cnn.com/ A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 79. Shadow Web: Linked Data (this resource intentionally left blank) http://en.wikipedia.org/wiki/DJ_Shadow http://dbpedia.org/resource/DJ_Shadow Accept: text/html Accept: application/rdf+xml http://dbpedia.org/page/DJ_Shadow http://dbpedia.org/data/DJ_Shadow 2 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/resource/DJ_Shadow 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/data/DJ_Shadow 0 Mementos: http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://dbpedia.org/page/DJ_Shadow A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 80. A short wish list. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 81. Use Memento-Datetime HTTP Header % curl -I https://twitter.com/machawk1/status/218015444496416768 HTTP/1.1 200 OK Date: Fri, 29 Jun 2012 15:50:55 GMT Status: 200 OK X-Transaction: 9a209a8deb15f4ba X-Frame-Options: SAMEORIGIN ETag: "4b79affd0f77a83019f619428f4ebaa5" Expires: Tue, 31 Mar 1981 05:00:00 GMT Last-Modified: Fri, 29 Jun 2012 15:50:55 GMT Content-Type: text/html; charset=utf-8 X-Runtime: 0.20638 Cache-Control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 Content-Length: 80501 Pragma: no-cache Strict-Transport-Security: max-age=631138519 X-MID: 75f6c6061c2be34447493adc6c33317c61740b5f Set-Cookie: [cookie stuff deleted] X-XSS-Protection: 1; mode=block Vary: Accept-Encoding Server: tfe A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 82. Richer APIs for Archives <?xml version="1.0"?> <URLEnvelope> <URL text="http://www.example.org" /> <outlinks> <timestamp value="20122020202"> outlinks, <olink> <href>http://www.anotherexample.org</href> with context, <atext>click here</atext> <window>In the following example click here we show you example</window> datestamps </olink> <olink> <href>http://www.anotherexample2.org</href> <atext>click here2</atext> <window>In the following example click 2 we show you example</window> </olink> </timestamp> </outlinks> possible now, but there is <inlinks> a bootstrapping problem of <ilink> proving value to the archive inlinks, <href>http://www.myexample.org</href> with context, <atext>Good Example</atext> <window>In the following example click here we show you interesting datestamps example</window> <timestamp value="20122020202"/> </ilink> <ilink> <href>http://www.myexample.org</href> <atext>Good Example</atext> <window>In the following example click here we show you interesting example</window> <timestamp value="201240402042"/> </ilink> </inlinks> </URLEnvelope> A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 83. Closing thoughts. A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 84. Open Problem: Compelling Applications • UI / usage idioms – remember "lost in hyperspace"? • What is right metaphor? A lot of times, people don't know what – VCR controls through versions? they want until you show it to them. -- May 25, 1998) – "Track Changes" controls? • Search like it's 1999 – yeah, we all want search for the Wayback Machine, but like a dog chasing a truck… • If you build the archive APIs, will the archive-based mashups come? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 85. Open Problem: Conceptual Gap • What archives offer: access by URI + timestamp – "what did cnn.com look like on May 31, 2007?" • What users want: concepts through time – "how has public opinion about the `affordable health care act' changed through time?" • hint: tag clouds aren't enough • to answer this question, you would likely have to find a current page that talks about the past A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 86. Open Problem: Archival Authenticity • Right now, we just implicitly trust Brewster and everyone at the IA • The only reason the politicians/pundits in previous examples didn't cheat is because they didn't know it was an option – black hat archives • What happens when there are multiple archives and they disagree? – spam archives? – "soft 401s"? – resolving archival disputes? • esp. if different archives can legitimately see different representations for the same URIs? A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 87. Open Problem: Monetizing The Archive • Until $ can be made, archives will labor in the shadows • OTOH, without monetization archives are relatively free of spammers, lawyers, and other predators Roger's Innovation Curve http://en.wikipedia.org/wiki/Diffusion_of_innovations A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012
  • 88. Five Easy Pieces Preservation not for no more hoary stories privileged priesthood about format obsolescence: http://blog.dshr.org/2010/09/reinforcing-my-point.html http://doi.acm.org/10.1145/1592761.1592794 http://booktwo.org/notebook/wikipedia-historiography/ archiving as branded service, not infrastructure http://blog.dshr.org/2010/06/jcdl-2010-keynote.html Don't dessicate resources; Endless metadata is not leave them on the web preservation… http://www.dlib.org/dlib/december05/12contents.html [too many to list] A Research Agenda for "Obsolete Data or Resources" Web Archiving Cooperative Workshop, Stanford, June 29, 2012