"Creating and Maintaining Web Archives"
Presented by Joanne Archer (University of Maryland), Tessa Fallon (Columbia University), Abbie Grotke (Library of Congress), and Kate Odell (Internet Archive)
ALA 2013 Presentation, co-authored/co-presented by Tyler Mobley.
Typo in OpenWMS slide. Should say "Maps from MODS, MARC, in-house text". Will correct when I get a chance.
EZID makes it simple for researchers and others to obtain and manage long-term identifiers for their digital content. The service can create and resolve identifiers, and it also allows entry and maintenance of information about the identifier (metadata). This presentation was given as part of a webinar series.
"Creating and Maintaining Web Archives"
Presented by Joanne Archer (University of Maryland), Tessa Fallon (Columbia University), Abbie Grotke (Library of Congress), and Kate Odell (Internet Archive)
ALA 2013 Presentation, co-authored/co-presented by Tyler Mobley.
Typo in OpenWMS slide. Should say "Maps from MODS, MARC, in-house text". Will correct when I get a chance.
EZID makes it simple for researchers and others to obtain and manage long-term identifiers for their digital content. The service can create and resolve identifiers, and it also allows entry and maintenance of information about the identifier (metadata). This presentation was given as part of a webinar series.
To facilitate data sharing from within the University of California system and beyond, the University of California Curation Center (UC3) is developing a new ingest and discovery layer for our data curation service, Dash. Dash uses the Merritt repository for preservation and a self-service overlay layer for submission and discovery of research datasets. The new overlay– dubbed Stash (STore And SHare)– will feature an enhanced user interface with a simple and intuitive deposit workflow, while still accommodating rich metadata. Stash will enable individual scholars to upload data through local file browse or drag-and-drop operation; describe data in terms of scientifically-meaning metadata, including methods, references, and geospatial information; identify datasets for persistent citation and retrieval; preserve and share data in an appropriate repository; and discover, retrieve, and reuse data through faceted search and browse. Stash can be implemented in conjunction with any standards-compliant repository that supports the SWORD protocol for deposit and the OAI-PMH protocol for metadata harvesting. Stash will feature native support for the DataCite or Dublin Core metadata schemas, but is designed to accommodate other schemas to support discipline-specific applications. By alleviating many of the barriers that have historically precluded wider adoption of open data principles, Stash empowers individual scholars to assert active curation control over their research outputs; encourages more widespread data preservation, publication, sharing, and reuse; and promotes open scholarly inquiry and advancement.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
NCompass Live - January 2, 2014.
http://nlc.nebraska.gov/ncompasslive/
The Bibliographic Framework Initiative, or BIBFRAME, is intended to provide a replacement to the MARC format as an encoding standard for library catalogs. Its aim is to move library data into a Linked Data format, allowing it to interact with other data on the Web. In this session, Emily Nimsakont, the NLC’s Cataloging Librarian, will cover the basics of BIBFRAME, describe what it can provide for users of library catalogs that MARC can’t, and outline what librarians should be aware of regarding this change in the cataloging landscape.
This presentation was given by Carl Stahmer of UC-Davis during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016
The modern library web environment consists of multiple content sources and applications that perform essential functions that often overlap and could potentially create a fractured user experience. For example, content in a library’s Drupal website may be replicated in LibGuides or WordPress blogs. Search functionality in a discovery platform may be replicated in a federated search tool or the ILS OPAC. This presentation provides tips, tackles technical and political challenges to building a single web experience for users, discusses solutions and use of APIs (application programming interfaces), provides concrete examples, and more.
Presentation & Discussion with focus on GERMAN NATIONAL LIBRARY OF MEDICINE ZB MED Strategic plans. Cologne
December 8th 2010
Guus van den Brekel (@digicmb)
Central Medical Library, UMCG
http://digicmb.blogspot.com/2010/12/german-national-library-of-medicine-zb.html
Considerations for Your Mobile LibraryRachel Vacek
The ubiquity of mobile devices has changed how people access information, and users expect libraries to provide mobile interfaces to that information. In this session, learn about the benefits and drawbacks of building a mobile website versus building a mobile application and get ideas for innovative services and tools for your library’s mobile environment
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.
To facilitate data sharing from within the University of California system and beyond, the University of California Curation Center (UC3) is developing a new ingest and discovery layer for our data curation service, Dash. Dash uses the Merritt repository for preservation and a self-service overlay layer for submission and discovery of research datasets. The new overlay– dubbed Stash (STore And SHare)– will feature an enhanced user interface with a simple and intuitive deposit workflow, while still accommodating rich metadata. Stash will enable individual scholars to upload data through local file browse or drag-and-drop operation; describe data in terms of scientifically-meaning metadata, including methods, references, and geospatial information; identify datasets for persistent citation and retrieval; preserve and share data in an appropriate repository; and discover, retrieve, and reuse data through faceted search and browse. Stash can be implemented in conjunction with any standards-compliant repository that supports the SWORD protocol for deposit and the OAI-PMH protocol for metadata harvesting. Stash will feature native support for the DataCite or Dublin Core metadata schemas, but is designed to accommodate other schemas to support discipline-specific applications. By alleviating many of the barriers that have historically precluded wider adoption of open data principles, Stash empowers individual scholars to assert active curation control over their research outputs; encourages more widespread data preservation, publication, sharing, and reuse; and promotes open scholarly inquiry and advancement.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
NCompass Live - January 2, 2014.
http://nlc.nebraska.gov/ncompasslive/
The Bibliographic Framework Initiative, or BIBFRAME, is intended to provide a replacement to the MARC format as an encoding standard for library catalogs. Its aim is to move library data into a Linked Data format, allowing it to interact with other data on the Web. In this session, Emily Nimsakont, the NLC’s Cataloging Librarian, will cover the basics of BIBFRAME, describe what it can provide for users of library catalogs that MARC can’t, and outline what librarians should be aware of regarding this change in the cataloging landscape.
This presentation was given by Carl Stahmer of UC-Davis during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016
The modern library web environment consists of multiple content sources and applications that perform essential functions that often overlap and could potentially create a fractured user experience. For example, content in a library’s Drupal website may be replicated in LibGuides or WordPress blogs. Search functionality in a discovery platform may be replicated in a federated search tool or the ILS OPAC. This presentation provides tips, tackles technical and political challenges to building a single web experience for users, discusses solutions and use of APIs (application programming interfaces), provides concrete examples, and more.
Presentation & Discussion with focus on GERMAN NATIONAL LIBRARY OF MEDICINE ZB MED Strategic plans. Cologne
December 8th 2010
Guus van den Brekel (@digicmb)
Central Medical Library, UMCG
http://digicmb.blogspot.com/2010/12/german-national-library-of-medicine-zb.html
Considerations for Your Mobile LibraryRachel Vacek
The ubiquity of mobile devices has changed how people access information, and users expect libraries to provide mobile interfaces to that information. In this session, learn about the benefits and drawbacks of building a mobile website versus building a mobile application and get ideas for innovative services and tools for your library’s mobile environment
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014Dag Endresen
Implementation of persistent and globally unique identifiers for specimens held in natural history collections worldwide will open up new opportunities for referring to these physical resources in an interlinked digital context such as the Internet. Here, we will describe the approach for persistent identification of collection specimens developed and implemented at the Natural History Museum in Oslo (NHM-UiO) by the the Norwegian participant node to the Global Biodiversity Information Facility (GBIF-Norway). The Norwegian university museums are invited to use our resolver service at "http://purl.org/gbifnorway/id/<uuid>" when publishing biodiversity data to GBIF. All occurrence records published through GBIF-Norway, with appropriate PURL-UUID identifiers mapped to the Darwin Core occurrenceID, will automatically be added to our resolver service and kept updated.
Capture All the URLS: First Steps in Web ArchivingKristen Yarmey
Presentation with Judy Silva (Fine & Performing Arts Librarian and Archivist at Slippery Rock University) and Alexis Antracoli (Records Management archivist at Drexel University) at the Pennsylvania Library Association's 2013 annual conference in Seven Springs, Pennsylvania.
Abstract: As higher education embraces new technologies, teaching, learning, research, and record-keeping is increasingly taking place on university websites, on university-related social media pages, and elsewhere on the open web. This dynamic digital content, however, is highly vulnerable to degradation and loss. This session will introduce the concept of web archiving and articulate why it’s important for colleges and universities. Speakers will demonstrate web archiving service Archive-It and then share lessons learned from their institutions’ web archiving initiatives, from unexpected stumbling blocks to strategies for raising funds and support from campus stakeholders.
Library discovery: past, present and some futureslisld
A presentation at the NISO virtual conference on Webscale Discovery Services, 20 November 2013.
Considers some of the issues that have led to the adoption of these services, and some future directions.
Distinguishes between discovery (providing a library destination) and discoverability (making stuff discoverable elsewhere).
web 2.0, library systems and the library systemlisld
The Web 2.0 environment is characterized by concentration and diffusion. Library services are not well matched to this environment: they are fragmented and difficult to mobilize in user workflows. This presentation analyzes this situation and suggests some directions.
Rich Media Hoarders session for 24HourPhotoshopExtensis
Breathe in, and breathe out…and relax! Getting organized doesn’t need to be stressful, in fact, it helps reduce stress in the long run. Getting organized also means being able to quickly find the files you need, or even better, letting other people find the files instead of you. In this session you’ll learn practical tips on file naming, folder organization, keywords, and other organizational tricks you can start using immediately with our without an actual DAM system.
Beyond MARC: BIBFRAME and the Future of Bibliographic DataEmily Nimsakont
The Bibliographic Framework Initiative, or BIBFRAME, is intended to provide a replacement to the MARC format as an encoding standard for library catalogs. Its aim is to move library data into a Linked Data format, allowing it to interact with other data on the Web. In this session, Emily Nimsakont, the NLC’s Cataloging Librarian, will cover the basics of BIBFRAME, describe what it can provide for users of library catalogs that MARC can’t, and outline what librarians should be aware of regarding this change in the cataloging landscape.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
Rachel Frick, OCLC Executive Director of the OCLC Research Library Partnership, reviews some of the broad agenda items and recent publications related to the work of OCLC Research. Rachel is then joined for two presentations on specific research topics. First, Sharon Streams (OCLC Director of WebJunction) and Monika Sengul-Jones (OCLC Wikipedian-in-Residence) present on “Public Libraries and Wikipedia.” Next, Kenning Arlitsch (Dean, Montana State University Library) and Jeff Mixter (OCLC Senior Software Engineer) share their findings on “Accurate Institutional Repository Download Measurement using RAMP, the Repository Analytics and Metrics Portal.”
CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
The Goal is Crawlability.
Allow and encourage webcrawlers to access everything on your website that you want users to be able to find.
(1) If webcrawlers can’t get to your stuff...
(2) Search engines won’t index your stuff...
(3) Your stuff won’t turn up in users’ web searches...
(4) Users won’t find your stuff!
Data “publication” attempts to appropriate for data the prestige of publication in the scholarly literature. While the scholarly communication community substantially endorses the idea, it hasn’t fully resolved what a data publication should look like or how data peer review should work. To contribute an important and neglected perspective on these issues, we surveyed ~250 researchers across the sciences and social sciences, asking what expectations “data publication” raises and what features would be useful to evaluate the trustworthiness and impact of a data publication and the contribution of its creator(s).
In early 2014, we asked science and social science researchers...
• What expectations do the terms publication and peer review raise in reference to data?
• What features would be useful to evaluate the trustworthiness, evaluate the impact, and enhance the prestige of a data publication?
Although there is consensus that datasets should be treated like “first class” research objects in how they are discovered, cited, and recognized, this is still far from a reality. Datasets are poorly indexed by search engines, and they are rarely cited in formal reference lists. A solution that a number of journals are implementing is to publish discovery and citation proxy objects in the form of peer-reviewed “data papers.” A strength of this approach is that it requires dataset creators to write up rich and useful metadata for the paper, but an accompanying weakness is that busy creators are not always willing to invest the necessary time and energy. To enhance dataset discoverability without burdening creators, EZID (easy-eye-dee) will begin using dataset metadata to automatically generate lightweight, non-peer reviewed publications that will increase the exposure of the metadata to search engines. EZID (ezid.cdlib.org) maintains public DataCite metadata records for over 167,000 datasets, any of which could be viewed as HTML or as a dynamically generated PDF. In cases where the creator has submitted only the required DataCite metadata, the document will function as a cover-sheet or landing page. If the creator chooses to submit optional Abstract and Methods metadata (over 2,000 records already contain Abstracts), the document expands to more closely resemble a traditional journal article, while retaining the linking functionality of a landing page. A potential bonus is that providing an incrementally improved document in exchange for the effort of submitting incrementally improved metadata may encourage authors to submit more than the minimum required metadata.
Software development should build on the successful work of others. The DMPTool helps researchers with data management planning, but what about other phases of the data life cycle? In this webinar, we will discuss what software integration with the DMPTool might look like, and why it is important. Topics include:
1. Background: why tools integration is important; why we are talking about this in terms of the DMPTool.
2. Details and plans for DMPTool2 regarding software integration and compatibility.
3. Future possibilities for software integration for DMPTool2
4. Example of successful integration of tools: work at the Center for Open Science.
Data management plans existed long before the NSF started requiring them. DMPs have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans are potentially a major time saver and a huge asset for the project. In this webinar, we will cover how to go beyond funder requirements and develop more thorough data DMPs The Gulf of Mexico Research Initiative requires an extensive data management plan for projects it funds; we will hear about their efforts and how they are planning to use the DMPTool going forward.
More from University of California Curation Center (20)
2137ad - Characters that live in Merindol and are at the center of main storiesluforfor
Kurgan is a russian expatriate that is secretly in love with Sonia Contado. Henry is a british soldier that took refuge in Merindol Colony in 2137ad. He is the lover of Sonia Contado.
2137ad Merindol Colony Interiors where refugee try to build a seemengly norm...luforfor
This are the interiors of the Merindol Colony in 2137ad after the Climate Change Collapse and the Apocalipse Wars. Merindol is a small Colony in the Italian Alps where there are around 4000 humans. The Colony values mainly around meritocracy and selection by effort.
Explore the multifaceted world of Muntadher Saleh, an Iraqi polymath renowned for his expertise in visual art, writing, design, and pharmacy. This SlideShare delves into his innovative contributions across various disciplines, showcasing his unique ability to blend traditional themes with modern aesthetics. Learn about his impactful artworks, thought-provoking literary pieces, and his vision as a Neo-Pop artist dedicated to raising awareness about Iraq's cultural heritage. Discover why Muntadher Saleh is celebrated as "The Last Polymath" and how his multidisciplinary talents continue to inspire and influence.
Hadj Ounis's most notable work is his sculpture titled "Metamorphosis." This piece showcases Ounis's mastery of form and texture, as he seamlessly combines metal and wood to create a dynamic and visually striking composition. The juxtaposition of the two materials creates a sense of tension and harmony, inviting viewers to contemplate the relationship between nature and industry.
Fed by curiosity and beauty - Remembering Myrsine Zorba
Future of web archiving
1. Future of Web Archiving
Stephen Abrams
California Digital Library
Martin Klein
Los Alamos National Laboratory
Jimmy Lin
University of Maryland
Michael Nelson
Old Dominion University
Digital Preservation 2014, Washington, July 22-24
2. www.flickr.com/photos/adesigna/4090782772
Agenda
Web archiving problems and opportunities
Memento tools
WarcBase platform
Assessing quality of archives
Discussion
Agenda
Web archiving problems and opportunities
Memento tools
WarcBase platform
Assessing quality of archives
Discussion
3. Web archiving is important but (really) hard
Why web archiving?
Continuation of longstanding mission to
collect, preserve, and provide access to the
scholarly record and our cultural heritage
Publishing/dissemination platform of
choice
But …
www.flickr.com/photos/alaig/3522953697
www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382
the web isn’t the web anymore
4. Web in transition
Document retrieval
Document viewer
HTML
Common
Desktop
Information
Programming environment
Virtual machine
JavaScript
Personalized
Mobile/handheld/wearable
Things
www.flickr.com/photos/swamibu/2223726960 www.flickr.com/photos/sharples/79222765
A “web” of notes with links (like
references) between them …”
– Tim Berners-Lee, March 1989
5. (Some) other issues
Crawlers don’t act like browsers
► Need robots that act more like people
www.flickr.com/photos/benhusmann/5126030385
6. (Some) other issues
Crawlers don’t act like browsers
Responsiveness to time-sensitive content
► Need to bypass v-e-r-y deliberate collection development
procedures
Gaurdian News and Media Limited
9. (Some) other issues
Crawlers don’t act like browsers
Responsiveness to time-sensitive content
Policies, rights, and permissions
Difficult integration into traditional management
and discovery services
Siloed collections
www.flickr.com/photos/54159370@N08/7148880783
11. Supporting research
Little awareness in the scholarly community
Poorly understood use cases
Few tools
Traditional find→download→manipulate locally
workflows may not be feasible at web scale
► Need APIs and business models for in situ analysis
berkeley.edu/teach www.flickr.com/photos/infocux/8450190120
12. www.flickr.com/photos/bartelomeus/4184705426
Browsing the past should be as
simple and intuitive as the now
Better discovery modalities
www.flickr.com/photos/shebalso/6357626617
mechanisms
Technological opportunities
Better capture mechanisms
► Headless browsers
► API harvesters
…
Better discovery modalities
► Browsing the past should be as
simple and intuitive as the now
…
13. Cooperative opportunities
Complementary collection development
Coordinated infrastructure support and operation
► Or perhaps centralized – a HathiTrust for web archives?
Crowd sourcing selection, description, quality
assurance
www.flickr.com/photos/chiotsrun/4115059294 www.flickr.com/photos/sagesolar/9230445157
First of all, why is web archiving important?
As members of memory institutions, it is the continuation in a new technological context of our longstanding mission and obligation to collect, preserve, and provide access to the scholar record and our collective cultural heritage.
Since the web is where the content is, that is where we have to go to acquire it.
But the fundamental problem is that the web is not web.
As soon as you think you have quantified or characterized it, it has changed into something else; and as soon as you have processes in place to capture web content, the content is not available in the same way.
What a tangled web we weave, https://www.flickr.com/photos/alaig/3522953697
Thorsten Hartmann, Untitles, https://www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382
It’s different than what anyone – Tim Berners-Lee included – had in mind 25 years ago
The web is no longer giant document retrieval system, but a programming environment
The browser is no longer a document view, but a general purpose virtual machine; its fundamental language is no longer HTML but JavaScript.
The mode of experience has shifted from a common to a highly personalized one; whose web are we archiving?
Crumbled paper, https://www.flickr.com/photos/84564583@N08/11167321155
The great pyramid: Size matters, https://www.flickr.com/photos/swamibu/2223726960
A pile of rocks, https://www.flickr.com/photos/sharples/79222765
Paywalls, robot exclusions, crawler traps, … What we need is a collection mechanism that acts like a person
Ben Husmann, The FREE HUGS robot says "I am here for you“, https://www.flickr.com/photos/benhusmann/5126030385
Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes
Search is simple if you know the URL
Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes
Hossam el-Hamalawy, Tahrir Square, https://www.flickr.com/photos/elhamalawy/6378330927
U Can’t Touch This, https://www.flickr.com/photos/vblibrary/7414544704
Dan Storey, Square peg in a round hole, https://www.flickr.com/photos/21664580@N04/2095574414
Paywalls, robot exclusions, crawler traps, …
Event-driven content doesn’t mesh well with established – meaning v-e-r-y deliberate – collection development processes
Search is simple if you know the URL
How to find enough good people? (We’re hiring!)
“You’re collecting that?”
May need programmatic or API access to in situ collection analysis
Headless browsers (PhantomJS, Umbra, etc.), API harvesters
Make browsing the past web as simple and intuitive as browsing the live web
Net casting at disk Contarf Pelican Park, https://www.flickr.com/photos/shebalso/6357626617
Bart van de Biezen, Goed Zoekveld, https://www.flickr.com/photos/bartelomeus/4184705426
Avoid needless duplication of effort
As librarians we have historically given perhaps inordinate priority to content creators and curators and not enough to consumers. But over significant timespans it is the users who affirmatively seek out and exploit content who may be best positioned to contribute towards its successful management.
Meyer lemons, https://www.flickr.com/photos/chiotsrun/4115059294
We sit in the shade and drink lemonade, https://www.flickr.com/photos/sagesolar/9230445157
Michael Harries, Drawing back the curtain, http://cdn.ws.citrix.com/wp-content/uploads/2012/05/iStock_000010348904XSmall.jpg