Realization of an Automatic E-mail Handling Agent System
1. The Realization of Agent-Based E-mail Automatic Handling System
CHEN Xiao-ping, LIU Gui-quan, WANG Xu-fa, ZHAO Lei
(Department of Computer Science and Technology,University of Science and Technology of China, Hefei 230027)
Abstract Currently e-mail is an important network ply, etc. In this paper, Situation is divided into 7 lev-
based communication method. Based on the agent els:
techniques and machine learning method, a kind of Situation = {Excellent, Very Good, Good, Normal,
interface agent that can handle e-mails automatically Poor, Very Bad, Terrible}
for users is designed and implemented. The mail agent learns her user’s interest model
Keywords Agent, machine learning, interface agent when her user handles his/her e-mails. At the very be-
ginning, agent has no knowledge about her user and
1 Introduction cannot give her user any help. But when an agent has
learned to certain degrees, she can actively handle e-
As an important means of communication, e-mail mails for her user.
is being used by millions of network users, and the
amount is increasing. Although users can receive use- 3 The Method and Implementation
ful mails, they also receive many “garbage mails”.
Such mails not only waste a lot of computer re- Obviously, some of the features of an e-mail have
sources, but make users difficult to access useful in- no effect on the user’s interest. Thus, for the mail
formation also. Therefore, users hope that the system agent to learn her user’s interest model, she must re-
can handle the e-mails automatically: the system will move those useless features and represent the mail in
inform the user in case important mails arrive, and a compress fashion. In the vector space information
delete the garbage mails. retrieval paradigm documents are represented as vec-
For simplicity, the system was designed for En- tors[5]: Assume some dictionary vector D, where each
glish e-mails only. element di is a word. Each document then has a vec-
tor V, where element vi is the weight of word di for
2 The Basic Idea that document. If the document does not contain di
then vi = 0.
It is reasonable to assume that users can appropri- In the typical information retrieval setting there is
ately determine the relevance of a particular mail to a collection of documents from which an inverted in-
his interest. We model the user’s determination and dex is created. However, for the application discussed
his/her action to handle a particular mail as a tuple: here, e-mails arrive unexpectedly and the dictionary
< Document, Situation, Action > vector D is difficult to define beforehand, thus the
Such tuples are called user interest model or inter- traditional vector space representation is inappropri-
est model. Where Document contains the sender of ate for e-mails and needs to be modified, which will
the e-mail 、 the sending date 、 the address of the be discussed in detail below.
sender, etc., and the compress representation of the 3.1 Representing E-mails
mail text, Situation refers to the importance of the An e-mail consists of two components: header and
mail based on Document, and Action is the user’s ac- body, where header contains control information,
tion to handle the mail, such as delete、 、
save print、re- such as the sender of the e-mail、 sending date、
the the
2. address of the sender, etc.; and body is the mail text. puting、computability all reduced to comput.
When a new mail arrives, agent reads in the head- Words are then weighted using a “TFIDF”
er, analyzes and saves the control information as his- scheme: the weight vdi of a word di in an e-mail text
tory record. Such information can be used to help fur- D is derived by multiplying a term frequency (“TF”)
ther processing. Then, agent reads in the mail text component by an inverse document (here refers to e-
and extracts individual words from it. The mail text is mail) frequency (“IDF”) component:
thus represented as a vector: tf
n
D = (d1 , d2 , d3 , … , dn) v d i = 0.5 + 0.5 i
log
…… (Eqn. 1)
tf max df i
where di(i∈{1,2,…,n}) is a word appearing in the
mail body. For any di in D, if di belongs to stop words where tfi is the number of times word di appears in e-
(words so common as to be useless as discriminators, mail text D (the term frequency), tfmax is the maxi-
like the、 These words were structured as a Stop list
is. mum term frequency over all words in D, n is the
in the system), then di will be removed from D. number of e-mails that have been handled and dfi is
For the remainder of the words in D, agent uses the number of handled e-mails which contain di (the
[1][2]
the Porter suffix-stripping algorithm to reduce document frequency).
them to their stems. For instance, computer 、 com- The process can be illustrated by figure 1:
E-mail
Header Body
word
Word Stream
History stop list
Record
Keywords
suffix-stripping
Stems
TFIDF scheme
Weighted Vector
Figure 1: The process of e-mail representation
For the system discussed here, the e-mail Situation The agent’s learning process can be divided into 3
is divided into 7 levels and there are no other distinc- stages according to agent’s degree of adeptness:
tions between e-mails that belong to the same Situa- 1. Learning Stage
tion, so log(n/dfi) in (Eqn. 1) can be substituted for At this stage, the agent has no experience and just
(the number of e-mails in current Situation that con- accumulates knowledge (about her user’s interest
tains di). model) according to her user’s action or evaluation.
3.2 Agent’s Learning Process At this stage the agent can not provide her user any
3. help yet. mails, agent’s dictionary may become larger and larg-
2. Growing-up Stage er, and the retrieval speed will decrease. Therefore,
After the agent has gained some experience, she mechanism to maintain the dictionary is very impor-
will be in the growing-up stage. Under the gained ex- tant. Agent uses following rules to maintain her dic-
perience, an agent can assist her user in dealing e- tionary:
mails. However, at this stage the agent has not been Rule 1: if a stem occurs very few, agent deletes it
competent enough that she needs further learning from her dictionary.
from her user’s feedback (especially in unexplored Rule 2: if a stem appears nearly the same frequen-
situations). For each e-mail, the agent presents her cy in every Situation, it is useless in classifying e-
evaluation to her user, if the user is not satisfied with mails, then agent deletes it from her dictionary and
agent’s evaluation, he/she can present his/her own stores it in her Stem-Stop list (analogous to the Stop
evaluation and the agent will update the interest mod- list introduced before). Agent uses frequency equili-
el based on this. bration (FE) to determine whether a stem should be
3. Applying Stage stored in her Stem-Stop list. The calculating method
If the agent has accumulated enough experience for a stem’s FE is given in (Eqn. 2), where E is the
with high accuracy and is permissive to handle e- FE of the stem, Si is the frequency the stem occurs in
mails for user, she is in the applying stage. As the fi- Situation i (i=1..7 refers to the 7 levels), and SA is the
nal stage of learning, the agent now can automatically mean of Si, i.e. SA = (S1 + … + S7)/7.
1
evaluate and handle e-mails for her user. For instance,
E = 2 ……
7
agent can delete a “Terrible” e-mails or break user’s (
∑S i −S A )
2
(Eqn. 2)
current work in case “Important” e-mails arrive. i=1
3.3 Agent’s Learning Method If a stem’s FE is less than a threshold (it is ad-
The e-mail agent employs statistic-based learning justable for user), this rule will be applied.
method. Firstly, the agent derives normalized vector Rule 3: user can either add some words to Stop
for each Situation based on the statistics over large list, or delete some words from Stop list.
amount of e-mails (the deriving method will be dis- 2. Learning Method
cussed below). Secondly, the agent chooses action ac- Agent’s learning module uses statistic-based
cording to the similarity between the current e-mail method. For every stem in the dictionary, the agent
vector and every normalized vector. During the pro- calculates its occurring frequency in each Situation.
cess, the agent will encounter the problem of dictio- Normalized vector for a Situation is obtained by sort-
nary construction and maintenance, which should be ing stems according to their occurring frequencies in
discussed first. the Situation, and Situation i’s normalized vector will
1. Dictionary Construction and Maintenance be denoted by Di.
Agent’s dictionary is dynamically constructed. As Suppose the current e-mail vector is denoted by D,
e-mails will be represented by stems, elements of the similarity between D and Di can be obtained by
agent’s dictionary are also stems. In addition to calculating the cosine of D and Di: SIM(D,Di) =
stems, the occurring frequency of every stem in every Cos(D,Di). (D and Di must have the same order).
Situation is also stored in agent’s dictionary. The The Situation corresponds to the maximum among
agent’s dictionary is initially empty. During the learn- the 7 similarities is the situation agent will choose.
ing process, new stems will be added to agent’s dic- The information in the e-mail header can be used
tionary and the occurring frequency of some old to revise the result. For example, a user may be inter-
stems might need recalculation. ested in e-mails from particular people. Agent learns
With the increasing of the number of handled e- such revising rules through inductive learning meth-
4. ods, which is not the topic of this paper. me threshold
3. Action Prediction
Currently, the e-mail agent usually adopts the 4 Experimental Results
same user-defined action for e-mails in the same Sit-
uation. After the agent has determined that the cur- In order to test the capability of the e-mail agent,
rent e-mail belongs to a Situation, she will choose 55 e-mails were selected for experiment. The parame-
one of the following actions to do: If the similarity ters of the selected e-mails and the experimental re-
between the e-mail and the Situation is above the sults are as following:
tell-me threshold, agent will suggest the user to take 1.Compress Properties:
the corresponding action; if the similarity is above the The maximum compressibility: 77.1%. The mini-
do-it threshold, the agent autonomously takes the cor- mum compressibility: 35.7%. The average length af-
responding action. The default values of the tell-me ter compressing: 113 words.
threshold and the do-it threshold are 0.7 and 0.95, re- 2.Correctness of Prediction:
spectively. The two thresholds can be set by the user, Number of predicted e-mails: 60 (has repetitions).
and the do-it threshold must be greater than the tell- Number of wrongly predicted e-mails: 16.
Number of
16
Errors
12
8
4
Handled E-
mails
10 20 30 40 50 60
Figure 2. The relationship between errors and handled e-mails
The relationship between the number of wrongly realization.
predicted e-mails and the number of handled e-mails
is illustrated in figure 2. Figure 2 indicates that the er- 6 Conclusion and Future Directions
ror rate decreases with the increasing number of han-
dled e-mails: 11 errors occur in the first 20 e-mails, The experimental results show that the perfor-
however only 5 errors occur in the last 40 e-mails. mance of the e-mail agent is to some extent satisfac-
tory. Moreover, the method discussed in this paper
5 Comparison with Related Works can be applied in some other Internet-based services.
As the e-mail agent uses statistic-based learning
With the fast development of Internet, currently method, and the problems relevant to context are in-
network-based services are hotspots of computer ap- evitable. Thus, the agent will be more appropriate for
plications. Now, some corporations (say, Microsoft[3]) application if natural language processing module is
[4]
and institutes have researched on how to automati- added in.
cally handle e-mails for the user. But for a variety of
reasons, the references did not concern the details of
5. References
[1] M.F. Porter. An algorithm for suffix stripping.
Program, 14(3) 130-137 (1980).
[2] W.B. Frakes. Stemming algorithms. In: W.B.
Frakes and R. Baeza-Yates, editors, Information Re-
trieval: Data Structures and Algorithms, pp. 131-160.
Prentice Hall, Inc., Englewood Cliffs, NJ, 1992.
[3] Based on the introduction of Kaifu Lee, the dean
of the Chinese Institute of Microsoft, 1998.
[4] Y. Lashkari, et al., Collaborative Interface Agents,
MIT Media Laboratory (1996).
[5] G. Salton and M.J. McGill. An Introduction to
Modern Information Retrieval. McGraw-Hill, 1983.