1. ePADD
Email,
Process,
Appraise,
Discover,
Deliver
CurateCamp
2015
Peter
Chan
Digital
Archivist
Apr.
23,
2015
2. Emails
Archives
in
Our
Collec?ons
• Robert
Creeley
-‐
~50,000
• Richard
Fikes
-‐
~100,000
• Terry
Winograd
-‐
~650,000
• Benoit
Mandelbrot
• Harrison
Studio
• Stanford
Humanity
Lab
3. Common
Ways
to
Archive
Emails
Paper
• Print
the
emails
• File
the
printed
emails
to
the
respec?ve
content
folders
Electronic
• Archive
emails
using
func?ons
provided
in
email
clients
5. Normaliza?on
• Converts
email
from
the
closed,
proprietary
file
formats
to
standard,
portable
formats
• Emailchemy,
MailStore
6. Appraisal
• Owner:
– Filter
messages
to/from
certain
correspondents
– Review
messages
containing
certain
words
(divorce,
daughter,
etc.)
• Curator:
– Ensure
certain
informa?on
exists
– Get
overall
view
on
who,
where,
what
are
men?oned
in
the
messages
• Email
clients
• ePADD
• Email
clients
• ePADD
7.
8.
9.
10. Processing
• Place
restric?on
on
messages
containing
• personal
iden?fiable
informa?on
(SS#,
credit
card
#,
etc.)
• privacy
informa?on
(student
grades,
salary,
grievances,
medical
informa?on,
etc.)
• Informa?on
s?pulated
by
donors
• ePADD
16. Processing
Organizing
• Group
messages
on
certain
words
(project
name,
event
name)
together
• Gather
all
messages
belong
to
the
same
person
with
mul?ple
emails
together
• Group
all
image
a_achments
in
one
place
• List
all
person,
loca?on,
organiza?on
en??es
• ePADD
28. Processing
Extract
interes?ng
items
• List
all
books,
movies
men?oned
in
all
messages
• Give
breakdown
of
organiza?ons
by
type
(Universi?es,
Companies
and
Museums,
etc.)
• List
events
• List
all
topics
discussed
in
messages
• Create
local
authority
records
• Future
ePADD
29. Discovery
• Existence
of
email
archives
• Informa?on
about
the
email
archives
(as
in
tradi?onal
finding
aids)
• Informa?on
about
the
email
archives
(all
person,
loca?on,
organiza?on
en??es
and
correspondents)
• Ins?tu?on
catalog
system,
Wiki,
Finding
Aid
Repository
(OAC
etc.),
search
engines
• Finding
Aids
• ePADD
30.
31.
32.
33. Delivery
• Email
messages
• Full
text
search
• Request
copy
• See
a_achment
files
(documents,
spreadsheets)
• See
image
a_achments
• Bulk
search
• Annotate
messages
• Organize
messages
• Email
clients
• ePADD
• Quickview
Plus
34.
35.
36.
37.
38.
39.
40.
41. Named
En?ty
Recogni?on
• Stanford
Named
En?ty
Recognizer
(NER)
– Jenny
Rose
Finkel,
Trond
Grenager,
and
Christopher
Manning.
2005.
Incorpora?ng
Non-‐local
Informa?on
into
Informa?on
Extrac?on
Systems
by
Gibbs
Sampling.
Proceedings
of
the
43nd
Annual
Mee?ng
of
the
Associa?on
for
Computa?onal
Linguis?cs
(ACL
2005),
pp.
363-‐370.
h_p://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
– GNU
General
Public
License
(v2
or
later)
• OpenNLP
– (Apache
license)
• Custom
NER
– Use
address
book,
Wikipedia,
Freebase