Slideshare.net (beta)

 

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 1 (more)

Preserving the scholarly record with WebCite (www.webcitation.org): an archiving system for long-term digital preservation of cited webpages

From eysen, 1 month ago

(Talk at the 12th International Conference on Electronic Publishin more

583 views  |  0 comments  |  1 favorite  |  11 downloads  |  1 embed (Stats)
 

Tags

internet archiving; digital preservation; citing web material archive 2.0 citation

more

 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 583
on Slideshare: 568
from embeds: 15

Slideshow transcript

Slide 1: WebCite® (www.webcitation.org) WebCite® (www.webcitation.org) Editor/Publisher, J Med Internet Res Gunther Gunther Eysenbach MD MPH Eysenbach MD MPH Associate Professor  Department of Health Policy, Management and Evaluation, & KMDI, University of Toronto; Senior Scientist,  Centre for Global eHealth Innovation, Division of Medical Decision Making and Health Care Research;  Toronto General Research Institute of the UHN, Toronto General Hospital, Canada

Slide 2: Mission WebCite® is an on-demand archiving system (controlled by citing and cited authors, editors, and publishers), which enables long-term digital preservation and citability of any kind of Internet-accessible object * * webpages, blogs, wikis, data files e.g. spreadsheets, PDF-reports, “grey” research reports, preprints etc.

Slide 3: E-publishing & Open Access Research Group at the CGEI, Toronto • Journal of Medical Internet Research (www.jmir.org), – Living publishing lab – a pioneer in Open Access publishing (10 yrs) – Leading journal in its discipline (Impact Factor 3.0) – “triple-O” philosophy (open access, open source, open peer- review) – OS contributions include contributions to OJS and XML- typesetting software (originally © MJ Suhonos, G. Eysenbach, J Alperin, code released under GNU forms basis for PKP Lemon8 project) • CIHR-funded research on the Impact of Open Access on Knowledge Translation (see e.g. Eysenbach. PLoS Biol 4(5): e157) • Publishing innovations incl. WebCite® (www.webcitation.org)

Slide 4: www.jmir.org

Slide 5: Authors increasingly cite non- traditional (web)references • Webpages (e.g. personal homepages) • “grey” PDF reports (e.g. research progress reports, etc.) • Blogs • Wikis • Datasets which are available online Note: For the purpose of this talk I refer to “webpages” or webreferences - but what I really mean is any sort of electronic digital object that can be cited and which can be deemed non-traditional (not having a DOI)

Slide 7: Problem 1: URLs go “dead”

Slide 9: Attrition rate of cited non-journal URLs 100 90 80 70 % URLs still working 60 50 40 In one study published in the journal Science, 13% of Internet 30 references in scholarly articles were inactive after only 27 months. 20 10 0 0 2 4 6 8 10 12 years Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, et al. Information science. Going, going, gone: lost Internet references. Science 2003 Oct 31;302(5646):787-788. DOI:10.1126/science.1088234

Slide 10: Problem 2: Even if URLs don’t go “dead”, their content may change

Slide 11: Eysenbach G. Towards quality management of medical information on the internet: evaluation, labelling, and filtering of informationBMJ 1998;317:1496-1502

Slide 12: Today, that site looks different… medpics.org

Slide 13: Wikis and Blogs change constantly

Slide 14: The homepage of a blog shows the most recent posts only

Slide 15: Problem 3: Internet material not deemed “citable” (impedes the use of blogs, wikis, online-sharing of datasets etc.)

Slide 16: Editors often discourage citing web material (including datasets) URL:http://www.plantphysiol.org/misc/ifora.shtml. Accessed: 2008-06-26. (Archived by WebCite® at http://www.webcitation.org/5YsaBISU5)

Slide 17: Fear of plagiarism / not getting credits Internet material not considered citable (Deemed unstable, not archived) Authors are reluctant to -Making data and datasets online accessible -Participate in collaborative projects (wikis) -Share information in blogs

Slide 18: Problem 4: Crawler-based archiving insufficient

Slide 19: Limitations of crawler based archiving • No author-initiated on demand archiving on a given date/time • “Shotgun” approach • Crawler cannot go everywhere (“hidden web”) • No impact statistics (how often has my archived copy been retrieved) • Impossible to curate WebCite = Web Archiving 2.0

Slide 20: The solution: WebCite® • First mentioned as an idea and implemented as a prototype in 1998 (Eysenbach, BMJ 1998;317:1496-1502) • Project idea revived in 2004/2005 • First implemented by J Med Internet Res • Today, used by >200 journals and large publishers (including Biomed Central, Oxford University Press) • Became member of the International Internet Preservation Consortium in 2008

Slide 21: Reverse (citation-triggered) archiving Self (author-triggered) archiving Citing Author  Cited Author /archive /boo   (dynamic  kma content) rkle t /archive (self-archiving) (static What the world needs /comb  content) J. Author This is a sample citing paper [1]. WebCite® References: (optional) DOI 8. Doe J. www.citedwebsite.com/exmpl /comb  assignment [Accessed 1.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- Link DOI® server  Resolver ive rch Snapshot Reader  /a mirrors Retrieval Request (DOI CrossRef® with Hash) Forward  Linking XML Libraries/Digital Publisher/Editor IA XML Preservation  Manuscript Partners © WebCite® with DOI® Third-party archiving

Slide 22: Citing Author /archive /boo    kma rkle t What the world needs /comb J. Author This is a sample citing paper [1]. WebCite® References: 8. Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- Snapshot Reader mirrors Retrieval Request Libraries/Digital IA Preservation Partners © WebCite® Third-party archiving

Slide 25: Two possible citation formats to cite the WebCite snapshot Opaque (ID-based) Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008- 06-26. URL:http://gunther-eysenbach.blogspot.com. Accessed: 2008-06-26. (Archived by WebCite® at http://www.webcitation.org/5YreMGRz7) Transparent Eysenbach, Gunther. Gunther Eysenbach Random Research Rants Blog. 2008- 06-26. http://www.webcitation.org/query?url =http%3A%2F%2Fgunther-eysenbach.blogspot.com&date=2008-06-26 (Note that there are also others: Hash-based, and citing-document-DOI-based)

Slide 26: Reader point of view: for retrieving archived material the reader simply clicks on the WebCite link Webcitation.org What the world needs J. Author This is a sample citing paper [1]. 2. Request is 4. Displays redirected to cached version References: webcitation 8. Doe J. www.webcitation.org?cache_url= www.citedwebsite.com/exmpl &cache_date=31.1.2003 [Accessed 31.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- 3. Attempts to retrieve “live” cited 1. Reader clicks on URL, if not found displays cached cited webcitation- version (and/or other versions) URL (on 1.1.2005) www.citedwebsite.com/exmpl Cached version (timestamp ERROR: NOT FOUND 31.1.2004)

Slide 27: Bookmarklet Can be used to rapidly archive the currently viewed webpage (bookmarklet hands over current URL and email adress of the citing author to the WebCite server)

Slide 28: Reverse (citation-triggered) archiving Self (author-triggered) archiving Citing Author  Cited Author /archive /boo   (dynamic  kma content) rkle t /archive (self-archiving) (static What the world needs /comb content) J. Author This is a sample citing paper [1]. WebCite® References: 8. Doe J. www.citedwebsite.com/exmpl [Accessed 1.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- Snapshot Reader mirrors Retrieval Request Libraries/Digital IA Preservation Partners © WebCite® Third-party archiving

Slide 29: As “potentially cited” author I can self- archive and add a static WebCite-enriched reference as citation suggestion…

Slide 30: As “potentially cited” author I can self- archive and add a static WebCite-enriched reference as citation suggestion…

Slide 31: … or I provide a dynamic link to the WebCite archiving form (“WebCite this!”)

Slide 32: … or I provide a dynamic link to the WebCite archiving form (“WebCite this!”)

Slide 33: Click on “WebCite this” populates the archiving form with metadata from the cited author

Slide 35: (the same approach can be used by authors of wikis, datasets etc.)

Slide 36: Implementation from a publisher / editor point of view

Slide 37: Level 1-4 implementation Retrospective focussed crawling of old articles 4 WebCite® immediately archives cited 3 webreferences on publication (combing XML files) Editor/Copyeditor “webcites” cited document 2 before publication Author “webcites” document immediately 1 (or reference manager takes care of this) Editors stipulate this in their Instructions for authors Time since author saw the cited webdocument

Slide 39: Level 1-Implementation by journal editors: Instructions for authors

Slide 40: Reverse (citation-triggered) archiving Self (author-triggered) archiving Citing Author /archive /boo    kma rkle t What the world needs /comb J. Author This is a sample citing paper [1]. WebCite® References: 8. Doe J. www.citedwebsite.com/exmpl /comb [Accessed 1.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- ive arch  / mirrors CrossRef® Forward  Linking XML Libraries/Digital Publisher/Editor IA XML Preservation  Manuscript Partners © WebCite® with DOI® Third-party archiving

Slide 42: Implemented by >200 journals

Slide 43: What’s next Future developments

Slide 44: WebCite 2.0 • User accounts • Enables users to view a list of the snapshots they created (and to categorize and export them e.g. in BibTex, Refman etc.) • Enables tagging, “crowdsourcing” of curation tasks such as metadata entering & reconciliation • Recommender service (people who cited x also cited y) • Post-publication peer-review (others can rate documents) • For cited authors – WebCite® Impact Factor (access / citation statistics, which can be used for tenure & promotion purposes) – WebCitation-Alert service

Slide 46: Implementation of WebCite® in tools facilitating “archive as you cite” • Bibliographic management systems (Endnote, reference manager) and shared bookmarks (Connotea, CiteULike) • XML-editing software (Word 2007 XML- addin, Lemon8 etc.) • Plugin for OJS and other manuscript management systems (allowing authors to automatically WebCite all references in their manuscript)

Slide 47: WebCite® works within the International Internet Preservation Consortium (IIPC) • Collect and preserve a rich body of Internet content from around the world • To foster the development and use of common tools, techniques and standards that enable the creation of international archives • To encourage and support national libraries everywhere to address Internet collecting and preservation http://netpreserve.org

Slide 48: 2008 IIPC Members (38) • Asia • Europe, cont. – Jewish National and University Library (Israel) – National Archives (U.K.) – National Diet Library, Japan – National Library of Scotland – National Library Board, Singapore – Netarchive.dk (Royal Library and the State – National Library of China and University Library, Aarhus) – Österreichische Nationalbibliothek (Austrian National Library) • Europe – Schweizerische Nationalbibliothek (Swiss – Biblioteca de Catalunya (Library of Catalonia) National Library) – Biblioteca Nazionale Centrale di Firenze (National – Virtual Knowledge Studio – Royal Library of Italy, Florence) Netherlands Academy for Arts and – Biblioteka Narodowa (National Library of Poland) Sciences – Bibliotheque nationale de France (National Library of France) • North America – British Library (U.K.) – Bibliothèque et Archives Nationales du – Deutsche Nationalbibliothek (German National Québec (BAnQ) Library) – California Digital Library (U.S.) – European Archive Foundation – Centre for Global eHealth Innovation, – Hanzo Archives Ltd. (U.K.) WebCite® Internet Citations Archiving – Kansalliskirjasto (National Library of Finland) Project (Canada) – Koninklijke Bibliotheek (National Library of the – Internet Archive (U.S.) Netherlands) – Library and Archives Canada – Kungl. biblioteket (National Library of Sweden) – Library of Congress (U.S.) – Landsbokasafn Islands – Haskolabokasafn (National – Library of Virginia (U.S.) and University Library of Iceland) – United States Government Printing Office – Latvijas Nacionālā bibliotēka (National Library of Latvia) – University of North Texas Libraries (U.S.) – Nacionalna i sveučilišna knjižnica u Zagrebu (National and University Library in Zagreb, Croatia) • Oceania – Narodna in univerzitetna knjižnica (National and – National Library of Australia University Library, Slovenia) – National Library of New Zealand – Národní knihovna České republiky (National Library of the Czech Republic)

Slide 49: The vision • A global infrastructure (standard APIs) – for cross-archive searching of cited URLs (by URL & date) – Decentralized storing of archived webmaterial • Pilot project with WebCite®, Internet Archive, and Library and Archives Canada

Slide 50: Summary: What WebCite® contributes • Links/URL no longer go 404 (dead) • WebCite’d content does not change • Internet material can be deemed citable and “archived” – Encourages “openess” (authors contribute to blogs, wikis etc., and make their datasets available) – Takes the submission load off journals – much of the scholarly communication can take place outside of journals • Provides access/impact statistics for cited authors • Enables one-click self-archiving • “Internet Archiving 2.0”: Enables archiving of the “hidden/deep web” (where crawlers cannot go), collaborative assignment of metadata

Slide 51: Call for action • If you are an citing author: use WebCite next time you cite a non-journal URL • If you are a blogger or a (potentially cited) author publishing online in any other way, put a “WebCite this!” link on your page • If you are an editor/publisher: Implement WebCite in your workflow (instructions for authors, copyeditors, XML production department) • If you are a librarian: Contact us to become a long-term preservation partner

Slide 52: www.medicine20congress.com, Toronto, Sept 4-5th, 2008

Slide 53: Thank you! Dr G. Eysenbach, Email: geysenba at uhnres.utoronto.ca or @gmail.com, My peer-reviewed Journal: http://www.jmir.org My Blog: http://gunther-eysenbach.blogspot.com My Conferences: http://www.medicine20congress.com http://www.ehealthcongresss.org My Slides: http://www.slideshare.net/eysen Funding Change Foundation, Canadian Institutes for Health Research, NSERC, European Union, SSHRC

Slide 54: Appendix

Slide 55: Copyright Issues • WebCite® honors robot exclusion standards and “no- archive” tags • Copyright holders can request removal of material • “Fair use” defence (used for non-profit/scholarly purposes, only a part of the site was archived, etc.) • U.S. court ruled that Google’s caching does not constitute a copyright violation, because of fair use and an implied license (Field vs Google, US District Court, District of Nevada, CV-S-04-0413-RCJ-LRL) • In the future, WebCite® may also – Allow copyright holders to specify a fee-per-access royalty fee – Long-term goal: WebCite® does not physically store anything but instead deposits the material in the respective National Libraries etc., who often have a legal deposit mandate* Legal deposit: a copy of any work published in COUNTRY must be deposited with the National Library of COUNTRY

Slide 56: WebCite® is a disruptive technology • If online articles/material are – Permanently archived and “citable” – Findable – “Rankable” (post-publication peer-review) – (all of which WebCite® plans to implement) • … what will be the role of the traditional scholarly journal publication? – Quality of pre-publication peer-review, editing, copyediting is key – Value-added services (e.g. semantic markup, curation)

Slide 57: <ref id="ref19"> <label>19</label> - - <nlm-citation citation-type="web"> <article-title>Who Gets ALS</article-title> <source>ALS Association</source> <access-date>2008 Apr 25</access-date> - <comment> <ext-link xlink:type="simple" xlink:href="http://www.alsa.org/als/who.cfm" ext-link-type="uri">http://www.alsa.org/als/who.cfm</ext-link> </comment> <pub-id pub-id-type=“other">5Y0NuDIU9</pub-id> </nlm-citation> </ref>

Slide 58: <ref id="ref19"> <label>19</label> - - <nlm-citation citation-type="web"> <article-title>Who Gets ALS</article-title> <source>ALS Association</source> <access-date>2008 Apr 25</access-date> - <comment> <ext-link xlink:type="simple" xlink:href="http://www.webcitation.org/query?url= http:// www.alsa.org/als/who.cfm&date=2008-04-25" ext-link-type="uri"> http://www.webcitation.org/query?url= http:// www.alsa.org/als/who.cfm&date=2008-04-25 </ext-link> </comment> </nlm-citation> </ref>

Slide 59: Reverse (citation-triggered) archiving Self (author-triggered) archiving Citing Author  Cited Author /archive /boo   (dynamic  kma content) rkle t /archive (self-archiving) (static What the world needs /comb  content) J. Author This is a sample citing paper [1]. WebCite® References: (optional) DOI 8. Doe J. www.citedwebsite.com/exmpl /comb  assignment [Accessed 1.1.2004] 9. ------------------- 10. ------------------- 11. ------------------- Link DOI® server  Resolver ive rch Snapshot Reader  /a mirrors Retrieval Request (DOI CrossRef® with Hash) Forward  Linking XML Libraries/Digital Publisher/Editor IA XML Preservation  Manuscript Partners © WebCite® with DOI® Third-party archiving