Successfully reported this slideshow.

End of Term Harvest User Interface

308 views

Published on

Screen images of prototype for 2008-2009 US Federal Government web archive using XTF to view content at the Internet Archive.

  • Be the first to comment

  • Be the first to like this

End of Term Harvest User Interface

  1. 1. Slides used for CDL staff meeting demo of End of Term Harvest prototype<br />
  2. 2. End-of-Term Harvest<br />
  3. 3. U of North Texas hosted Nomination Tool<br />
  4. 4. The Content<br />25 terabytes of federal government websites<br />Captured by Internet Archive, CDL and University of North Texas: August 2008-August 2009.<br />All content at Internet Archive: keyword search, URL lookup.<br />
  5. 5. CDL (Tracy) and IA (Kris Carpenter Negulescu) charged with providing public access interface to collection<br />1st Tuesday of each month, 12:00 pm:<br />Anything to report on public access?<br />No, nothing to report on public access.<br />
  6. 6. Also of note<br />Internet Archive experiments with generating MODS records for web archived content<br />Uses selenium to take snapshot images of archived content for QA<br />International Internet Preservation Consortium Access Working Group looking for some means of implementing cross-archive public access<br />
  7. 7. Then:<br />Martin Haye demonstrates XTF front-end to content in Merritt repository.<br />Tracy thinks “hmmm….”<br />
  8. 8. Demo<br />
  9. 9. Slides used to show CDL publishing group (XTF developers)<br />Minor changes to implement in navigation<br />Clear distinction between searching full text and searching metadata<br />Confirming that API for IA search results can be integrated (yep)<br />Some screens are images of functioning interfaces, others are images of wireframes. Fully functional screens are noted.<br />
  10. 10. Abbie G. will work on text / layout of home page<br />
  11. 11. Each organization can send some text to be included here.<br />
  12. 12. Search vs. browse<br />Keep ‘search full text’ from ‘search metadata’ as distinct as possible.<br />Brief results will look different. Full text search results will have no thumbnail and likely different metadata.<br />Document display will behave differently. Full text search results will render an archived page. Displaying a browse result will lead to the date navigation to all versions of the site’s home page.<br />
  13. 13. Search vs browse contd.<br />Will not attempt to integrate two kinds of searching on one screen or within one result set. <br />Search tab devoted strictly to full text search of IA content<br />Browse tab contains a ‘lookup’ that lets you search against site metadata.<br />
  14. 14. Results come back via API for full text search at IA.<br />
  15. 15. Except for top navigation bar, this is currently fully functional<br />
  16. 16. Except for top navigation bar, this is currently fully functional<br />
  17. 17. Browse by URL not yet functioningexpect to add facet by domain (.gov, .edu etc.)<br />
  18. 18. Metadata lookup: except for top navigation bar, this is currently fully functional<br />
  19. 19. Similar to ‘browse by URL’The current interface for the nomination tool that lets curators describe and nominate the ~5000 government websites.<br />Potential to add a browse domain feature that lets you do something like this.<br />
  20. 20. Other expected elements<br />creator<br />The agency name<br />Include as browse page and facet<br />Provenance[2]<br />Domain (.gov, .mil) etc. <br />Derive from URL<br />Facet for at least the browse by URL screen<br />
  21. 21. The record needed:<br /><title>Visualization of Remote Sensing Data</title><br /><creator>Agency name information if you have it</creator><br /><identifier>http://crawls-wm.us.archive.org/eot08/*/rsd.gsfc.nasa.gov/</identifer><br /><provenance>http://rsd.gsfc.nasa.gov/</provenance><br /><date>2008-09-16</date><br /><date>2009-08-14</date><br /><description>The Vis/RSD website is a showcase for stunning visualizations of satellite data by NASA's Goddard Laboratory for Atmospheres.</description><br /><subject>remote sensing</subject><br /><subject>data</subject><br /><subject>satellite</subject><br /><subject>radar</subject><br /><subject>telescope</subject><br /><subject>earth</subject><br /><subject>space</subject><br /><subject>land</subject><br /><subject>oceans</subject><br /><subject>science education</subject><br /><subject>land use</subject><br /><subject>NASA</subject><br /><type>web site</type><br /><format>text</format><br /><coverage>Executive</coverage><br /><relation> http://crawls.archive.org/collections/eot08/</relation><br />Other elements used, such as source for URL segments, are extracted from this data as they come into XTF. We don’t need a root dc element or namespace reference, but records would still work if IA had a reason to need them there.<br />

×