ePADD is a software package developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives.
1. Potential Future Directions for
ePADD
Peter Chan, ePADD Project Manager
Digital Archivist, Stanford Libraries
Workshop 2 "After the Digital Revolution"
London, January 25-26, 2018
2. Other Platforms used in Personal Digital Archives
Facebook
Twitter
WhatsApp, Slack
Calendar
YouTube
Photos
Wordpress
3. Apply unique ePADD features to other File Types
Browsing of extracted entities
Merging of identities
Lexicon search
Query generator
Connection to authority files
Redacted version for public access
Restriction management
4. Target Different User Groups
Donors, archivists, and researchers of email archives in collecting repositories
Individuals who want to organize their own emails
Journalists who want to analyze email from their sources
Cultural Institutions which want to keep the knowledge in the emails of outgoing staff
5. Improve Existing ePADD Functions
Better Screening of Sensitive Information
Improve Findability of Archives / Required Information
Better Search Capability
More Automatic Classification / Grouping of Contents
Better Label / Annotation Functions
Improve Technical Infrastructure and UX
Improve Interface with other Systems
6. Better Screening of Sensitive Information
Good basic and advanced search functions
Regular expressions according to institution policy
Ability to store structured keywords
Entities related to sensitive message from DBpedia
Derive keywords from Wordnet
Ontology created by systematic interview with users
Classifier trained to recognize email with particular sensitivities
7. Improve Findability / Discovery of Archives
Provide many thousands of metadata of one archives
Provide cross collection search of metadata from archives in one institution
Provide a platform for all institutions to host their email archives
Provide cross institution search and browsing of common entities across all archives
Facilitate inclusion of metadata generated by ePADD in institutions’ catalog systems
Facilitate inclusion of metadata generated by ePADD in Wiki
Enhance the discovery module to facilitate crawling by Google
9. More Automatic Classification / Grouping of Contents
Correspondent with different email addresses
Traditional entity extraction (person, organization, location)
Fine-grained entity extraction (based on entity in DBpedia)
Word frequency
Content Classification - specific (booking confirmations, receipts, etc.)
Content Classification - general (sports, economics, etc.)
Topic modelling
10. Better Label / Annotation Functions
Message based annotations and labels with no semantics
Message based labels with simple semantics
Message based labels with more advanced semantics
Role based annotations and labels with / without semantics
Text (alphanumeric) based labels with semantics
Correspondent based labels with semantics
Entity based labels with semantics
11. Improve Technical Infrastructure and UX
Single user platform
Handle 650,000 messages
Multi-users
Web-based application
Handle millions of messages
Better user experience
Remote Reading Room
12. Improve Interface with other Systems
Export headers in csv file for network analysis
Export whole / part of archives in mbox for preservation or use in other email clients
Export confirmed correspondents in csv file for finding aids
Connect to image recognition system to generate metadata
Export confirmed entities in RDF for other linked data systems
Connect to Wayback machine for dead url
Provide API (application program in interface) for machine consumption
13. Technical Team
Sudheendra Hangal - Co-founder at Amuse Labs, Magic Lamp Software. Faculty
member at Ashoka University. Worked at Sun. Stanford PhD CS.
Chinmay Narayanan - PhD Indian Institute of Technology, Research focuses on the
interactions of programming languages, logic and formal semantics. Worked at Simen.
Chaiyasit (Sit) Manovit - Founder of Nimeyo, Ixora Technology. 1st hire at PwrLite,
acquired by Xilinx. Worked at NCR, Intel, Sun, and Xilinx. Stanford PhD EE.
Peter Chan - Digital Archivist, ePADD Project Manager for 6 years; Co-founder of
MyIPhoto.com; VP, Operations Planning at Bank of America, Asia.
Josh Schneider - Assistant University Archivist, ePADD Community Manager