Preserving the scholarly record with WebCite (www.webcitation.org): an archiving system for long-term digital preservation of cited webpages

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Preserving the scholarly record with WebCite (www.webcitation.org): an archiving system for long-term digital preservation of cited webpages - Presentation Transcript

    1. Editor/Publisher, J Med Internet Res Associate Professor  Department of Health Policy, Management and Evaluation, & KMDI, University of Toronto; Senior Scientist ,  Centre for Global eHealth Innovation, Division of Medical Decision Making and Health Care Research;  Toronto General Research Institute of the UHN, Toronto General Hospital, Canada Gunther Eysenbach MD MPH Gunther Eysenbach MD MPH WebCite® (www.webcitation.org) WebCite® (www.webcitation.org)
    2. WebCite® is an on-demand archiving system (controlled by citing and cited authors, editors, and publishers), which enables long-term digital preservation and citability of any kind of Internet-accessible object * Mission * webpages, blogs, wikis, data files e.g. spreadsheets, PDF-reports, “grey” research reports, preprints etc.
    3. E-publishing & Open Access Research Group at the CGEI, Toronto
      • Journal of Medical Internet Research (www.jmir.org),
        • Living publishing lab
        • a pioneer in Open Access publishing (10 yrs)
        • Leading journal in its discipline (Impact Factor 3.0)
        • “ triple-O” philosophy (open access, open source, open peer-review)
        • OS contributions include contributions to OJS and XML-typesetting software (originally © MJ Suhonos, G. Eysenbach, J Alperin, code released under GNU forms basis for PKP Lemon8 project)
      • CIHR-funded research on the Impact of Open Access on Knowledge Translation (see e.g. Eysenbach. PLoS Biol 4(5): e157 )
      • Publishing innovations incl. WebCite® (www.webcitation.org)
    4. www.jmir.org
    5. Authors increasingly cite non-traditional (web)references
      • Webpages (e.g. personal homepages)
      • “ grey” PDF reports (e.g. research progress reports, etc.)
      • Blogs
      • Wikis
      • Datasets which are available online
      Note: For the purpose of this talk I refer to “webpages” or webreferences - but what I really mean is any sort of electronic digital object that can be cited and which can be deemed non-traditional (not having a DOI)
    6.  
    7. Problem 1: URLs go “dead”
    8.  
    9. Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, et al. Information science. Going, going, gone: lost Internet references. Science 2003 Oct 31;302(5646):787-788. DOI:10.1126/science.1088234 In one study published in the journal Science , 13% of Internet references in scholarly articles were inactive after only 27 months.
    10. Problem 2: Even if URLs don’t go “dead”, their content may change
    11. Eysenbach G. Towards quality management of medical information on the internet: evaluation, labelling, and filtering of information BMJ 1998;317:1496-1502
    12. Today, that site looks different… medpics.org
    13. Wikis and Blogs change constantly
    14. The homepage of a blog shows the most recent posts only
    15. Problem 3: Internet material not deemed “citable” (impedes the use of blogs, wikis, online-sharing of datasets etc.)
    16. Editors often discourage citing web material (including datasets) URL:http://www.plantphysiol.org/misc/ifora.shtml. Accessed: 2008-06-26. (Archived by WebCite ® at http://www.webcitation.org/5YsaBISU5)
    17. Internet material not considered citable (Deemed unstable, not archived) Fear of plagiarism / not getting credits
      • Authors are reluctant to
      • Making data and datasets online accessible
      • Participate in collaborative projects (wikis)
      • Share information in blogs
    18. Problem 4: Crawler-based archiving insufficient
    19. Limitations of crawler based archiving
      • No author-initiated on demand archiving on a given date/time
      • “ Shotgun” approach
      • Crawler cannot go everywhere (“hidden web”)
      • No impact statistics (how often has my archived copy been retrieved)
      • Impossible to curate
      WebCite = Web Archiving 2.0
    20. The solution: WebCite ®
      • First mentioned as an idea and implemented as a prototype in 1998 (Eysenbach, BMJ 1998;317:1496-1502)
      • Project idea revived in 2004/2005
      • First implemented by J Med Internet Res
      • Today, used by >200 journals and large publishers (including Biomed Central, Oxford University Press)
      • Became member of the International Internet Preservation Consortium in 2008
    21. Citing Author /comb WebCite® /archive Cited Author /bookmarklet /archive (self-archiving) Publisher/Editor /archive /comb
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004]
      • -------------------
      • -------------------
      • -------------------
      XML Manuscript with DOI® DOI® server IA Libraries/Digital Preservation Partners mirrors Snapshot Retrieval Request (DOI with Hash) © WebCite ® Link Resolver Reverse (citation-triggered) archiving Self (author-triggered) archiving  Third-party archiving CrossRef® Forward Linking XML   (optional) DOI assignment       Reader  (dynamic content) (static content)
    22. Citing Author /comb WebCite® /archive /bookmarklet
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004]
      • -------------------
      • -------------------
      • -------------------
      IA Libraries/Digital Preservation Partners mirrors Snapshot Retrieval Request © WebCite ® Third-party archiving    Reader
    23.  
    24.  
    25. Two possible citation formats to cite the WebCite snapshot Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008-06-26. URL:http://gunther-eysenbach.blogspot.com. Accessed: 2008-06-26. (Archived by WebCite ® at http://www.webcitation.org/5YreMGRz7) Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008-06-26. http:// www.webcitation.org/query?url =http%3A%2F%2Fgunther-eysenbach.blogspot.com&date=2008-06-26 Opaque (ID-based) Transparent (Note that there are also others: Hash-based, and citing-document-DOI-based)
    26. 4. Displays cached version 2. Request is redirected to webcitation www.citedwebsite.com/exmpl ERROR: NOT FOUND 3. Attempts to retrieve “live” cited URL, if not found displays cached version (and/or other versions)
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.webcitation.org?cache_url= www.citedwebsite.com/exmpl &cache_date=31.1.2003 [Accessed 31.1 . 2004]
      • -------------------
      • -------------------
      • -------------------
      Webcitation.org Reader point of view: for retrieving archived material the reader simply clicks on the WebCite link 1. Reader clicks on cited webcitation-URL (on 1.1.2005) Cached version (timestamp 31.1.2004)
    27. Bookmarklet Can be used to rapidly archive the currently viewed webpage (bookmarklet hands over current URL and email adress of the citing author to the WebCite server)
    28. Citing Author /comb WebCite® /archive Cited Author /bookmarklet /archive (self-archiving)
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004]
      • -------------------
      • -------------------
      • -------------------
      IA Libraries/Digital Preservation Partners mirrors Snapshot Retrieval Request © WebCite ® Reverse (citation-triggered) archiving Self (author-triggered) archiving Third-party archiving     Reader (dynamic content) (static content)
    29. As “potentially cited” author I can self-archive and add a static WebCite-enriched reference as citation suggestion…
    30. As “potentially cited” author I can self-archive and add a static WebCite-enriched reference as citation suggestion…
    31. … or I provide a dynamic link to the WebCite archiving form (“WebCite this!”)
    32. … or I provide a dynamic link to the WebCite archiving form (“WebCite this!”)
    33. Click on “WebCite this” populates the archiving form with metadata from the cited author
    34.  
    35. (the same approach can be used by authors of wikis, datasets etc.)
    36. Implementation from a publisher / editor point of view
    37. Level 1-4 implementation Time since author saw the cited webdocument Author “webcites” document immediately (or reference manager takes care of this) Editors stipulate this in their Instructions for authors Editor/Copyeditor “webcites” cited document before publication 1 2 WebCite® immediately archives cited webreferences on publication (combing XML files) 3 Retrospective focussed crawling of old articles 4
    38.  
    39. Level 1-Implementation by journal editors: Instructions for authors
    40. Citing Author /comb WebCite® /archive /bookmarklet Publisher/Editor /archive /comb
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004]
      • -------------------
      • -------------------
      • -------------------
      XML Manuscript with DOI® IA Libraries/Digital Preservation Partners mirrors © WebCite ® Reverse (citation-triggered) archiving Self (author-triggered) archiving Third-party archiving CrossRef® Forward Linking XML      
    41.  
    42. Implemented by >200 journals
    43. What’s next Future developments
    44. WebCite 2.0
      • User accounts
      • Enables users to view a list of the snapshots they created (and to categorize and export them e.g. in BibTex, Refman etc.)
      • Enables tagging, “crowdsourcing” of curation tasks such as metadata entering & reconciliation
      • Recommender service (people who cited x also cited y)
      • Post-publication peer-review (others can rate documents)
      • For cited authors
        • WebCite® Impact Factor (access / citation statistics, which can be used for tenure & promotion purposes)
        • WebCitation-Alert service
    45.  
    46. Implementation of WebCite® in tools facilitating “archive as you cite”
      • Bibliographic management systems (Endnote, reference manager) and shared bookmarks (Connotea, CiteULike)
      • XML-editing software (Word 2007 XML-addin, Lemon8 etc.)
      • Plugin for OJS and other manuscript management systems (allowing authors to automatically WebCite all references in their manuscript)
    47. WebCite® works within the International Internet Preservation Consortium (IIPC)
      • Collect and preserve a rich body of Internet content from around the world
      • To foster the development and use of common tools, techniques and standards that enable the creation of international archives
      • To encourage and support national libraries everywhere to address Internet collecting and preservation
      • http://netpreserve.org
    48. 2008 IIPC Members (38)
      • Asia
        • Jewish National and University Library (Israel)
        • National Diet Library, Japan
        • National Library Board, Singapore
        • National Library of China
      • Europe
        • Biblioteca de Catalunya (Library of Catalonia)
        • Biblioteca Nazionale Centrale di Firenze (National Library of Italy, Florence)
        • Biblioteka Narodowa (National Library of Poland)
        • Bibliotheque nationale de France (National Library of France)
        • British Library (U.K.)
        • Deutsche Nationalbibliothek (German National Library)
        • European Archive Foundation
        • Hanzo Archives Ltd. (U.K.)
        • Kansalliskirjasto (National Library of Finland)
        • Koninklijke Bibliotheek (National Library of the Netherlands)
        • Kungl. biblioteket (National Library of Sweden)
        • Landsbokasafn Islands – Haskolabokasafn (National and University Library of Iceland)
        • Latvijas Nacionālā bibliotēka (National Library of Latvia)
        • Nacionalna i sveučilišna knjižnica u Zagrebu (National and University Library in Zagreb, Croatia)
        • Narodna in univerzitetna knjižnica (National and University Library, Slovenia)
        • Národní knihovna České republiky (National Library of the Czech Republic)
        • Nasjonalbiblioteket (National Library of Norway)
      • Europe, cont.
        • National Archives (U.K.)
        • National Library of Scotland
        • Netarchive.dk (Royal Library and the State and University Library, Aarhus)
        • Österreichische Nationalbibliothek (Austrian National Library)
        • Schweizerische Nationalbibliothek (Swiss National Library)
        • Virtual Knowledge Studio – Royal Netherlands Academy for Arts and Sciences
      • North America
        • Bibliothèque et Archives Nationales du Québec (BAnQ)
        • California Digital Library (U.S.)
        • Centre for Global eHealth Innovation, WebCite® Internet Citations Archiving Project (Canada)
        • Internet Archive (U.S.)
        • Library and Archives Canada
        • Library of Congress (U.S.)
        • Library of Virginia (U.S.)
        • United States Government Printing Office
        • University of North Texas Libraries (U.S.)
      • Oceania
        • National Library of Australia
        • National Library of New Zealand
    49. The vision
      • A global infrastructure (standard APIs)
        • for cross-archive searching of cited URLs (by URL & date)
        • Decentralized storing of archived webmaterial
      • Pilot project with WebCite®, Internet Archive, and Library and Archives Canada
    50. Summary: What WebCite® contributes
      • Links/URL no longer go 404 (dead)
      • WebCite’d content does not change
      • Internet material can be deemed citable and “archived”
        • Encourages “openess” (authors contribute to blogs, wikis etc., and make their datasets available)
        • Takes the submission load off journals – much of the scholarly communication can take place outside of journals
      • Provides access/impact statistics for cited authors
      • Enables one-click self-archiving
      • “ Internet Archiving 2.0”: Enables archiving of the “hidden/deep web” (where crawlers cannot go), collaborative assignment of metadata
    51. Call for action
      • If you are an citing author : use WebCite next time you cite a non-journal URL
      • If you are a blogger or a (potentially cited) author publishing online in any other way, put a “WebCite this!” link on your page
      • If you are an editor/publisher : Implement WebCite in your workflow (instructions for authors, copyeditors, XML production department)
      • If you are a librarian : Contact us to become a long-term preservation partner
    52. www.medicine20congress.com , Toronto, Sept 4-5 th , 2008
    53. Thank you!
      • Funding
      • Change Foundation, Canadian Institutes for Health Research, NSERC, European Union, SSHRC
      Dr G. Eysenbach, Email: geysenba at uhnres.utoronto.ca or @gmail.com, My peer-reviewed Journal : http://www.jmir.org My Blog : http://gunther-eysenbach.blogspot.com My Conferences : http://www.medicine20congress.com http://www.ehealthcongresss.org My Slides : http://www.slideshare.net/eysen
    54. Appendix
    55. Copyright Issues
      • WebCite® honors robot exclusion standards and “no-archive” tags
      • Copyright holders can request removal of material
      • “ Fair use” defence (used for non-profit/scholarly purposes, only a part of the site was archived, etc.)
      • U.S. court ruled that Google’s caching does not constitute a copyright violation, because of fair use and an implied license ( Field vs Google , US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL)
      • In the future, WebCite® may also
        • Allow copyright holders to specify a fee-per-access royalty fee
        • Long-term goal: WebCite® does not physically store anything but instead deposits the material in the respective National Libraries etc., who often have a legal deposit mandate*
      Legal deposit: a copy of any work published in COUNTRY must be deposited with the National Library of COUNTRY
    56. WebCite® is a disruptive technology
      • If online articles/material are
        • Permanently archived and “citable”
        • Findable
        • “ Rankable” (post-publication peer-review)
        • (all of which WebCite® plans to implement)
      • … what will be the role of the traditional scholarly journal publication?
        • Quality of pre-publication peer-review, editing, copyediting is key
        • Value-added services (e.g. semantic markup, curation)
    57. <ref id=&quot;ref19&quot;> <label>19</label> - -       <nlm-citation citation-type=&quot;web&quot;> <article-title>Who Gets ALS</article-title> <source>ALS Association</source> <access-date>2008 Apr 25</access-date> -       <comment> <ext-link xlink:type=&quot;simple&quot; xlink:href=&quot; http:// www.alsa.org/als/who.cfm &quot; ext-link-type=&quot;uri&quot;> http:// www.alsa.org/als/who.cfm </ext-link> </comment> <pub-id pub-id-type=“other&quot;>5Y0NuDIU9</pub-id> </nlm-citation> </ref>
    58. <ref id=&quot;ref19&quot;> <label>19</label> - -       <nlm-citation citation-type=&quot;web&quot;> <article-title>Who Gets ALS</article-title> <source>ALS Association</source> <access-date>2008 Apr 25</access-date> -       <comment> <ext-link xlink:type=&quot;simple&quot; xlink:href=&quot; http:// www.webcitation.org/query?url = http:// www.alsa.org/als/who.cfm &date =2008-04-25 &quot; ext-link-type=&quot;uri&quot;> http:// www.webcitation.org/query?url = http:// www.alsa.org/als/who.cfm &date =2008-04-25 </ext-link> </comment> </nlm-citation> </ref>
    59. Citing Author /comb WebCite® /archive Cited Author /bookmarklet /archive (self-archiving) Publisher/Editor /archive /comb
      • What the world needs
      • J. Author
      • This is a sample citing paper [1].
      • References:
      • Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004]
      • -------------------
      • -------------------
      • -------------------
      XML Manuscript with DOI® DOI® server IA Libraries/Digital Preservation Partners mirrors Snapshot Retrieval Request (DOI with Hash) © WebCite ® Link Resolver Reverse (citation-triggered) archiving Self (author-triggered) archiving  Third-party archiving CrossRef® Forward Linking XML   (optional) DOI assignment       Reader  (dynamic content) (static content)

    + eyseneysen, 2 years ago

    custom

    1729 views, 1 favs, 2 embeds more stats

    (Talk at the 12th International Conference on Elect more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1729
      • 1706 on SlideShare
      • 23 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 45
    Most viewed embeds
    • 22 views on http://gunther-eysenbach.blogspot.com
    • 1 views on http://proz.tumblr.com

    more

    All embeds
    • 22 views on http://gunther-eysenbach.blogspot.com
    • 1 views on http://proz.tumblr.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories