1. Self Archiving Tools Team
Description
Seed funded by Stanford University President's Fund, the Self Archiving Tools Team is developing
software tools to create digital archives of the unpublished “papers” of distinguished faculty,
scholars, and industry leaders (“luminaries”), and make them available on the next-generation
Internet. The team aims to solve the problem of getting this material online rapidly while the
luminary is still active and interested in participating, and to elicit valuable new knowledge and
meaning with high trust: the connections, the back-story, undocumented facts. We are developing
a largely automated, engaging, hosted software toolkit that will present the luminary with a visual
representation of their legacy for enhancement and, in the process, capture "content and context"
to enrich the online material and make it findable. Collections are linked across people, concepts,
and institutions and benefit from the network effect. http://luminaryarchives.stanford.edu
Benefits
Semantically linked digital archives offer a means
to identify hot spots of innovation and enhance
opportunities for collaboration. The tools and the
visualization techniques will demonstrate the
paths of influence, the progression of ideas, and
the network of innovation that bring concepts
from great minds to industry and the world.
Technology
• We are developing a largely automated
process workflow to collect born-digital
documents electronically, format them for
presentation and preservation, and to
deliver them to the luminary in a web
delivered tool to organize, edit, annotate,
and approve to make public. Physical material may be digitized and integrated into this
workflow resulting in “hybrid” archives.
• We are developing automatic tagging of documents to improve discovery. We use software
algorithms to scan text for "entities" such as people, places, concepts, organizations, and
are developing editing tools and workflows.
• We are developing visualization techniques to identify concepts and timelines from the
entity extraction process and present them graphically. These representations may be
used to elicit annotations from the luminary. We are developing workflows to capture these
assertions and new knowledge annotations, and to make these ready for the next
generation Internet discovery sites.
• We are applying next generation web technologies such as semantic web and Linked Open
Data to organize data and allow it to be shared and reused across applications and
institutions.
• Deliverables for each luminary include online database pages, visualizations, tagged
documents, browser views, annotations, and mash up pages (shown).
Team and Capabilities
The organization is currently a five-person team, including two developers, a scientist, a project
manager, and an operations assistant. The team supports back end and database programming,
User Interface programming, algorithm development and semantic web linking, document
processing, and project leadership. The team offers the following capabilities:
• "Snap to the grid" the legacy of a luminary to the linked data on the Internet.
• Process the digital and physical personal papers of luminaries (Stanford luminaries, or other
luminary collections and subject-group collections as directed).
• Visualize collections using faceted browsers, timelines and concept graphs.
• Capture, transcribe, and link reminiscences and insights by luminaries.
• Facilitate standards and partnering across technologies and institutions.
Will Snow, SALT Project Manager, Stanford University wdsnow-EE77@stanfordalumni.org Aug 2009