Message understanding conference 6 a brief history
1. zycnzj.com/ www.zycnzj.com
Message U n d e r s t a n d i n g C o n f e r e n c e - 6:
A Brief History
Ralph Grishman Beth Sundheim
Dept. of Computer Science Naval Command, Control and
New York University Ocean Surveillance Center
715 Broadway, 7th Floor Research, Development, Test and
New York, NY 10003, USA Evaluation Division (NRaD)
grishman@c s. nyu. edu Code 44208
53140 Gatchell Road
San Diego, CMifornia 92152-7420
sundheim@poj ke. nosc. mil
Abstract is then evaluated against a manually-prepared an-
swer key.
We have recently completed the sixth
The MUCs are remarkable in part because of
in a series of "Message Understanding
the degree to which these evaluations have defined
Conferences" which are designed to pro-
a prograin of research and development. DARPA
mote and evaluate research in informa-
has a number of information science and technol-
tion extraction. MUC-6 introduced sev-
ogy programs which are driven in large part, by
eral innovations over prior MUCs, most
regular evaluations. The MUCs are notable, how-
notably in the range of different tasks for
ever, in that they in large part have shaped the
which evaluations were conducted. We
research program in information extraction and
describe some of the motivations for the
brought it to its current s t a t e }
new format and briefly discuss some of
the results of the evaluations.
2 Early History
1 T h e M U C Evaluations MUC-1 (1987) was basically exploratory; each
We have just completed the sixth in a series of group designed its own format for recording the
Message Understanding Conferences, which have information in the document, and there was no
been organized by NRAD, the R D T & E division of formal evaluation. By MUC-2 (1989), the task
the Naval Command, Control and Ocean Surveil- had crystalized as one of template filling. One re-
lance Center (formerly NOSC, the Naval Ocean ceives a description of a class of events to be iden-
Systems Center) with the support of DARPA, tiffed in the text; for each of these events one must
the Defense Advanced Research Projects Agency. fill a template with information about the event.
This paper looks briefly at the history of these The template has slots for information about the
Conferences and then examines the considerations event, such as the type of event, the agent, the
which led to the structure of MUC-6} time and place, the effect, etc. For MUC-2, the
The Message Understanding Conferences were template had 10 slots. Both MUC-1 and MUC-
initiated by NOSC to assess and to foster research 2 involved sanitized forms of military messages
on the a u t o m a t e d analysis of military messages about naval sightings and engagements.
containing textual information. Although called The second MUC also worked out the details of
"conferences", the distinguishing characteristic of the primary evaluation measures, recall and pre-
the MUCs are not the conferences themselves, cision. To present it in simplest terms, suppose
but the evaluations to which participants must the answer key has Nke~ filled slots; and that a
submit in order to be permitted to attend the system fills Neor,.~t slots correctly and Nin~or,,~t
conference. For each MUC, participating groups incorrectly (with some other slots possibly left un-
have been given sample messages and instructions filled). Then
on the type of information to be extracted, and
have developed a system to process such messages. Ncorrect
recall -
Then, shortly before the conference, participants Nkey
are given a set of test messages to be run through
their system (without making any changes to the 2There were, however, a number of individual re-
system); the output of each participant's system scm'eh efforts in information extraction underway be-
[bre the first MUC, including the work on information
1The full proceedings of the conference are to be formatting of medieM narrative by Sager at New York
distributed by Morgan Kaufmann Publishers, San Ma- University; the formatting of naval equipment failure
teo, California; earlier MUC proeeedings~ for MUC-3, reports by Marsh at the Naval Research Laboratory;
4, and 5, are also available from Morgan Kaufmann. and the DBG work by Logieon for RADC.
zycnzj.com/http://www.zycnzj.com/
466
2. zycnzj.com/ www.zycnzj.com
precision =
Nco,,;,.ect Each of these can been seen in part as a reaction
Ncorrect + Nincorrect to the trends in the prior MUCs. The MUC-5
tasks, in particular, had been quite complex and
For MUC-3 (1991), tile task shifted to reports
a great effort had been invested by the government
of terrorist events ill Central and South Amer-
in preparing the training and test d a t a and by the
ica, as reported in articles provided by the For-
participants in adapting their systems for these
eign Broadcast Information Service, and the tem-
tasks. Most participants worked on the tasks for
plate becmne somewhat more complex (18 slots). 6 months; a few (the Tipster contractors) had
This same task was used for MUC-4 (1992), with a been at work on the tasks tbr consi(lerably longer.
further small increase in template complexity (24
While the performance of solne systems was quite
slots).
impressive (the best got 57% recall, 64% precision
MUC-5 (1993), which was conducted as part of overall, with 73% recall and 74% t)recision on the
the Tipster program, a represented a substantial 4 "(:or(;" template types), tile question naturally
fllrther j u m p in task complexity. Two tasks were arose as to whether there were m a n y apl)lieations
involved, international joint ventures and elec- tbr which art investment of one or several develop-
tronic circuit fabrication, in two hmgnages, En- ers over half->year (or more) could be justified.
glish and Japanese. The joint venture task re- Furthermore, while so much effort had been ex-
quired 11 templates with a total of 47 slots for pended, a large portion was specific to tire partic-
the output double tile number of slots defined ular tasks. It wasn't clear whether much progress
for M U C - 4 and the task documentation was was being made on the underlying technologies
over 40 pages long.
which would be needed for hetter understanding.
One innovation of MUC-5 was the use of a To address these goals, the meeting formulated
nested template structure. In earlier MUCs, each
an ambitious menu of tasks for MUC-6, with the
event had been represented as a single temi)late idea that individual participants could choose a
• in effect, a single record in a data l)ase, with a subset of these tasks. We consider the three goals
large nuinber of attributes. This format proved
in the three sections below, and describe the tasks
awkward when an event had several participmlts which were developed to address each goal.
(e.g., several victims of a terrorist attack) and one
wanted to record a set of facts about each partic-
ipant. This sort of information (:ould be ranch 4 Short-term subtasks
more easily recorded in the hierarchical structure The first goal was to identit~y, from the compo-
introduced for MUC-5, in which there was a single nent technologies being developed for information
template for an event, which pointed to a list of extraction, flmctions which would be of 1)ractical
temI(lates, one for each particii)mlt in tile event;. 4 use, would be largely domain indet)endent, and
couhl in the near term be performed automatically
3 MUC-6: initial goals with high ac('uracy. To meet this goal the con>
mittce developed the "named entity" task, which
1)ARI)A convened a meeting of Tipster partici- t(asically involves identifying the names of all the
pants and government representatives in Decca> people, organizations, and geographic locations in
bet' 1993 to define goals and tasks tot M U C - 6 ) a text.
Among the goals which were identified were The final task specification, which also involved
• demonstrating taskqndependent component time, currency, and percentage expressions, used
technologies of information extraction which SGML m a r k u p to identify the names in a text.
would be immediately useflfl Figure 1 shows a sample sentence with named en-
tity annotations. The tag ENAMEX ("entity name
• encouraging work to make information ex-
expression") is used for both people and organiza-
tractioil systems in<)re portable
tion names; the tag NUNEX ( " n u m e r i c expression")
• encouraging work on "deeper understanding" is used for currency and I)ercentages.
aTipster is a U.S. G o v e r m n e n t program of research 5 Portability
and development in the areas of inibrmation retrieval
and information extraction. The second goal was 1;o focus on portability in
4In fact the MUC-5 structure wa~smuch (nor(; com- the inibrmation extraction task the ability to
plex, because there were separate temt)lates for prod- rapidly retarget a system to extract; information
ucts, time, activities of organizations, etc.
about a different class of events. The comnfit-
'~The representatives of the resear(:h community
tee felt that it was important to demonstrate that
were Jim Cowie, lS(.alph Grishman (commit;tee chair),
Jerry Hobbs, Paul Jacobs, Lea Schubert, Carl Weir, useful extraction systems eouht be created in a
and Ralph Weischedel. The government people at- few weeks. To meet this goal, we decided that
tending wcre George Doddington, Donna Harman, the infbrmation extraction task for MUC-6 wouhl
Boyan Onyshkevych, John Prangc, Bill Schultheis, have to involve a relatively simple template, more
and Beth Sundheim. like MUC-2 than MUC-5; this was duhbed "mini-
zycnzj.com/http://www.zycnzj.com/
467
3. zycnzj.com/ www.zycnzj.com
Mr. <ENAMEX TYPE="PERSON">Dooner</ENAMEX> met with <ENAMEX:TYPE="PERSON">Martin
Puris</ENAMEX>, president and chief executive officer of < E N A M E X
TYPE="ORGANIZATION">Ammirati & Puris</ENAMEX>, about <ENAMEX
TYPE="ORGANIZATION">McCann</ENAMEX>~s acquiring the agency with billings of <NUMEX
TYPE="MONEY">$400 million</NUMEX>, but nothing has materialized.
Figure 1: Sample named entit;y annotation.
MUC". In keeping with |;he hierarchical tem- on a description of a particular (:lass of events
plate structure introduced in MUC-5, it was envi- (a "scenario") was called the "scenario template"
sioned |;hat the inini-MUC would have an event- task. A sample scenario template is shown in the
level template pointing to templates representing appendix.
|;he partieitmnts in the event (people, orgmfiza-
tions, products, e.tc.), me(liated perhaps by a "re- 6 Measures of deep understanding
lational" level template.
To further increase portability, a proposal was Another concern which was noted about the
made to standardize the lowest-level tenlplates MUCs is that tile systenls we.re tending to-
(for peoph',, orgaIfizations, etc.), since these basic wards relatively shallow understanding techlfiques
(:lasses are involved in a wide variety of actions. In (based IIrimarily on local pa.ttern inatching), and
this way, MUC participants could develop code for that not enough work was being done to build
these low-level telnplates once, and then use them up the mechanisms needed for deeper understand-
with m a n y different types of events. These low- ing. Therefore, tile committee, with strong en-
level t;emptates were named "telnplate elements". couragement front I)AII.PA, included three MUC
As the specification finally deveh)ped, tit(; rein- tasks which were intended to measure, aspex:ts of
plate element for orgalfizations had six slots, for the internal processing of an inforlnation extra(:-
the inaximal organization nalne, any aliases, the lion or hmguage understanding systenL These
type, a descriptive noun phrase, the locale (inost three tasks, which were collectively called Se-
specific location), and country. Slots are tilh,d mEwfl ("Senmntic Ewfluation") were:
only if inforlnation is explicitly given in the text • C o r e f e r e n c e : the systent would have to
(or, ill the ease of the country, can be inDrred mark coreferential noun t)hrases (the initial
Doln an explicit locale). The text SlmCification envisioned marking set-subsel;
We are striving to have a strong re- and part-whole relations, ill addition to iden-
newed creative partnership with Coca- tity relations)
Cola," Mr. Dooner says. However, o(lds • Word sense disambiguation: for each
of that hapt;ening are slim since word ope.n (:lass word (noun, verb, a,djective, ad-
from Coke headquarters in Atlanta is verb) in the text, the systein would have to
that... determine its sense using the Wordlmt clas-
wouht yiehl an organization telnplate elenmnt s|Ileal|on (its "synset", in Wordnet termii~of
with live of these six slots filled: ogy)
<0RGANIZATION-9402240133-5> := • Predicate-argument s t r u c t u r e : the sys-
ORG NAME: "Coca-Cola" tem wouhl have to create a tree interrelating
ORG ALIAS: "Coke" the constituellts of the sentence, using sonm
ORG TYPE: COMPANY set of gralnma.tical flnmtional relations
ORG_LOCALE: Atlanta CITY The committee recognized that, in seh;eting sneh
ORG COUNTRY: United States internal measures, it, was inaking sortie presumI)
(the first line identities this as organization tenl- tion regarding the structures and decisions which
plate 5 from article 9402240]33). an analyzer should make in understanding a doc-
Ever on the lookout for additional ewfluation llmellt. Not everyone would share these pre, sump-
measm'es, the committee decide, d to nlake the cre- lions, lint participants in the next MU(J would
ation of telnI)late eh,ments tbr all the people and be free 1;o enter the infornlation extraction evalu-
organizations in a text a separate MUC task. lake ation and skip some or all of these internal ewdua-
the named entil;y task, this was also seen as a Lions. Language understanding technology might
potential demonstration of the ability of systelns develop in ways very diIii?rent from those imagined
1;o pertbrm a useflfl, relatively dolnain indepen- by the committee, and these internal evaluations
dent task with near-term extraction te(:hnoh)gy might turn ollt t() t)e irrelevant distractions. How-
(although it was recognized as being more dilli- ever, froln the current perslmctive of tnost of the
cult than named entity, since it required merging eolnmittec, @ese seenmd fairly ])asic aspects of
information from several places in the text). The unde, rstanding, and so an experinmnt in evahlat-
old-style MUC information extraction task, based ing them (and encouraging improvem(mt in them)
zycnzj.com/http://www.zycnzj.com/
4:68
4. zycnzj.com/ www.zycnzj.com
w o u l d I)e worl;hwhil(~. Round 2: a n n o t a t i o n
T h e n e x t st;(; 1) was the IWel)axal;hm of a substa.nt;ia.1
7 Preparation process Lra.illing corpllS for LII(~ l;wo novel t,asks which re-
n m i n e d ( n m u e d Clll;il;y &lid COI'(~,f(~,FO,1IC(Q. S R A C o l
Round 1: R e s o l u t i o n of SemEval l)orat:ion k i n d l y p r o v i d e d tools which a i d e d in t;he
T h e c o m m i t t e e , h a d l)ropos(;(t a ve.ry anll)itious a n n o l ; a t i o n p r o c e s s . A g a i n a sl;alwa.rt grtml) of vt)l-
I ) r o g r m n of cvahu~l;ions. Wc now h a d to r(xhlce. uui;e(w a.nn()i;alx)rs was assenfl)led; 7 e a c h was 1)to -
t h e s e I)roi)osals to (let;ailed spe.cifi(:ations. The. vide(l w i t h 25 m't;i(:lcs f r o m 1;h('. W a l l S t r e e t .]our-
first s t e p was t;o (lo s o m e ma,mlal te.xl; anuol:a.-- na.l. There. was SOlUe o v e r l a p b(!Lween t;hc arLi(:les
l i o n for t h e f o r e ~,asks n a m e d em;ity mM the a s s i g n e d , s() t,haL we c o u l d IIIO&Slll'(! ~;}1(~.c o n s i s t e n c y
Selnt,;val t r i a d whi(:h were quit(: (tifii!r(!nt from of a.mloi;m;ion /w.|:weeu silx~s. T h i s a m l o i , a t i o n w~s
w h a t h a d be(m l;rie(l before, lh'M! sp(~(:ifi(:ations (lone. in I.he w i n t e r o[ 1994-95.
were p r e p a r e d for ca(:h t a s k , a n d in the sl)ring of A m a j o r role o[ the. mmol;aLion l)ro(:e.ss was Lo
] 994 a g r o u I) of vohmt(~ers (most;ly vel;(n:ans ()f ear-. i(lemify and res()lv(~ l)r(fl)h!ms wil;h l;he t a s k Sl)(X>
lier MU(Js) annol:~mxl a shorl: newst)~p(w m'tM(', ifi(:a.tions. For na.nied cnl;iifies, this was rel~tl;ively
using ('.ach set of specifi(:ations. st, rtdght, forwar([. For COI'(~[(~I'(;I/(',(;, it p r o v e d r(',-
P r o t ) l e m s a r o s e w i t h ea.(:h of t;he S e m E v a l tasks. m a r k a t ) l y (lifli(:ult to f()rmutat;e guitl(,lines which
were r e a s o n a l ) l y comI/lel;(~ a,nd <:onsist, ent.. s
* F o r corefcren('e., ther(', were p r o b l e m s i(hull;i[y-
ing i)art-whoh~ a n d sei;-sul)s(¢ rela.tions, mM RomM 3: d r y rml
d i s t i n g u i s h i n g the, two; a decision wa.s lm;er ( ) n e e the t;ask sl)e(:ifica.l;ions s e e m e d r(~asonably
m a d e to l i m i t ourselv(;s I;() i(lenLi(;y rela.I;ions. stM)l(b N l b d ) ()rg;ufiz(~(l a "(lry run" a full-s(:al(~
® b b r sens(' I:~gging, l;h(; ~l.llllOl;tl, t,()l'S forum that, r(~hearsal for M U C - 6 , I)ul; w i t h all result:s r('4)ori;ed
in s o m e cases W o r d n ( , t m a d e v e r y [ine dis- a.nouymously. T h e d r y r u n Ix)ok t)l;u:e in A p r i l
1995, wil;h a s(:enario i u v o l v i n g l a b o r union (:()n.
1;incl;ions a,nd thai; m a k i n g l;hese (list,incti(ms
l,ra,c.t; n(~gotia.l:i(ms. ()f 0 m sil;es whi(:h we.re in-
c o n s i s t e n t l y ill l;agging was v e r y ditticulI;.
v o l v e d in t;he a n n o t ; a t i o n l/r()(:('~s,q, t;en 1)arl:i(:ipatx~(l
e F o r p r e d i c a t e - a r g u m e a l t sl;ru(;l;llr(',, pracl;ically in l h e d r y run. R e s u l t s of t;h(~ d r y r u n were r(>
e v e r y new CoIIS[;Ill(;l; 1)(;y()lI(l s i m p l e clauses l)()rWxl n.I, l;he Tit)sl:er I)hase II 12--mout;h m e ( M u g
and n o u n l)hrases r;tise(l n e w issues which had in M a y 1995.
I;o t)e toilet:lively r(:solve(l.
8 The formal evaluation
Beyon(l th(;se in(lividuM t)rolflenls, il; was fell:
l;hal; l;he m e n u was s i m p l y (;oo anfl)il, ious, m M l;hal; The MUC6 f o r m a l ewflu;ttion was /mhl in
w('. w o u l d do t)('.l:t('x by (:on(:entrat, ing on out; (',le- ,~{(q)l:emt)ex 1995. T h e s(:(;nario (l(~finil;ion w;L,q dis-
menl: of the Sem(;v;fl l,riad for M U C - 6 ; a t a. me('.l;- t,ribuIxxt at, t;he t)egimling ()[' S ( q ) t e m b e r I l;he test
ing hehl in .hllm 1994, a d e c i s i o n was mad(; to d a t a was disiaibut, cd four we('.ks late.r, w i t h re
go w i t h coref('xea,(:('.. In i/arl;, this r(~tl(w.l;est a feel- sult;s d u e by (,he end ()[' th(; w('.ek. T h e ,qcena.rio
ing t h a t the t)rol)lems wi@ Lh(', (:()refl',ren(:(~ Sl)(X> involv(M (;h;l,II~O,S ill COI'|)OF;I,I;(~ (LK(~CIII;iv(; II],%II;/,~C-
ili(:a.I;ion w('.re l:he mosl; m n ( m a b l e l:o soluli(m, lilt, m('.n(; p('a,~onn(~l. The. (;valua.1;i(m reel; m m l y ()I 1;t1('.
also re.fle(:i;ed a. (:onvicl;ion I;hal; (:or(ff('r(m(:(~ idea> f~oals which h a d /)(~en set, 1)y th('~ iniLial p l a n n i n g
t:ilication h a d 1)een, &nd w o u l d re, m a i n , c r i t i c a l (:onfer(mt:e. in l)e(:emlmr ()f 1993.
1;o success ill inforina.t;iou cxl;r~mi;ion, au(1 st) it T h e r e were (;va]u;Lti(ms for [our t,asks: 1HIIII(RI
wgts [IIlpor[;~l,ll[; 1;o (?llC()llrtl,~(~ a, dvtl~ltc(;s in (:or(',[ k entit;y, (:orel'('.re.n(:e, 1;eml)lat(, c,lt!inenI;, }l,ll(t s(;c--
m(;nc(;, tin contrasl;, mosl; (;xt, rat:l;ion sysl;ems n m i o I;e, mt~lm;(u T t m r e w('r(; 16 t)m'ti(;ipmfl;s; 11.5
did nol; buil(t fltll t)redi(:ate-atrgument sl;ru(:l;ures, 1)arti(:it)al;e(l in the nmne(l ent, it,y t a s k , 7 in (',oref-
a n d w o r d - s e n s e (lismnbigual;ion p l a y e d a relal;iv('ly (~l'Oll(~(~,, ] 1 ill t(',ml)lat;(; elemenl;, an(l 9 in s(:enari()
,
stnall role ill exl;ra(:l;ion ( p a r t i c u l a r l y since (;xl;l';t(> l,(;mi)lal;(,,.
l;ion sysl;ems o l ) e r a t e d in a n a r r o w d o m a i n ) . Name(l e n i ; i t y was inl;(mdcd to b(; a siml)h~
'Phe (:or('~h'a'(;n(:('~ t a s k , like. t h e n a m ( x l entil;y t a s k on whi(:h syst, e m s coul(t (lernoustrat, e a high
l;ask, was a.nnotal;ed u s i n g S G M I , n()tal;i()tl. A level of 1)(!rforumn(:e ... high e n o u g h for imme(li-
C{]REF t a g has mt ID ai;l;ri|)ul;(' whi(:h i(lenl;ifies l;he m;e use. O u r su(:(;(;ss iu I;his t,a s k (~x(:(;(~(le(l our
t a g g e d noult 1)hrase or l)ron(mn, ll; tn;ty also ha.vc
a.n at,l;ril)ut;(' of t h e [orm REF--n, w h i c h indi(:al,es >l'he annol;;)A;ion groups were from BBN, Brall(t(fis
Univ., t~he Univ. of Durham, Lo(:kheed-Marl;in, New
thai; this lfln'ase is (:or(,fe,r(mtiM w i t h I;he 1)hrasc
Mexico Sl;ai;e Univ., N l b d ) , New York Univ., PRC,
wit;h I1) n. Figure, 2 shows an (;x(:('rt)I; fl'om ; m l;he, Univ. of l)(mnsylwmia, SAIC (San /)iego), SRA,
m ' t M e , ann(/l;al;c(t [or (;orefereal(;e. (; SR[, the Univ. of Shefliehl, SouLhe, rn Metlmdisl; Univ.,
mr(1 Ultisys.
6'The TYPE mM M]~N;tl;l;I'il)uLes which a p p e a r in l,he
SAs exl)e,rienced (:Oml)ut~tional linguists, we 1)rol)-
;tctmd annot;al;ion have been o m i t t e d here fin the s~tke ably should ha,re kuown 1)el;l;(',r l;han to l;hink this wa.s
of readM)ilil;y. an easy t~ask.
zycnzj.com/http://www.zycnzj.com/
469
5. zycnzj.com/ www.zycnzj.com
Maybe <COREF ID="136" REF="I34">he</CSREF>'II even leave something from <COREF
ID="138" REF="I39"><COREF ID="137" REF="I36">his</COREF> office</COREF> for <CSREF
ID="I40" REF="91">Mr. Dooner</COREF>. Perhaps <COREF ID="144">a framed page from
the New York Times, dated Dec. 8, 1987, showing a year-end chart of the stock market
crash earlier that year</COREF>. <COREF ID="I41" REF="I37">Mr. James</COREF> says
<COREF ID="142" REF="I41">he</COREF> framed <COREF ID="143" REF="I44"
STATUS="OPT">it</COREF> and kept <COREF ID="145" REF="I44">it</COREF> by <COREF
ID="146" REF="I42">his</COREF> desk as a "personal reminder. It can all be gone like
that."
Figure 2: Sample coreference annotation.
expectations. The majority of sites had recall overall was 47% recall and 70% precision.
and precision over 90%; the highest-scoring sys- One can observe an increasing convergence of
tem had a recall of 96% and a precision of 97%. methods tbr information extraction. Most of
Although one must keep in mind the somewhat the systems participating in MUC-6 employed a
limited range of texts in the test set (all are from cascade of finite-state pattern recognizers, with
the Wall Street Journal, in particular), the re- the earlier pattern sets recognizing entities, and
sults are excellent. A couple of these systems have the later sets recognizing scenario-specific pat-
been commercialized, and several are being incor- terns. This convergence may be one reason for
porated into government text-processing systems. tile bunching of scores for this task -- most sys-
Given this level of performance, there is probably tems fell in a rather narrow range in both recall
little point in repeating this task with the same and precision.
ground rules in a future MUC (although there The results of this MUC provide valuable pos-
might be interest in processing monoease text and itive testimony on behalf of information extra(>
in performing comparable tasks oil a more varied tion, but further improvement in both portability
corpus and for languages other than English). and performance is needed tbr m a n y applications.
The t e m p l a t e e l e m e n t task, while superfi- With respect to port~bility, custoiners would like
cially similar to named entities - ~ it is also based to have systems which can be ported in a t'ew
on identifying people and organizations ~ is sig- hours, or at most a few days, by someone with
nificantly more difficult. One has to identify de- less expertise than a system developer. How this
scriptions of entities ("a distributor of kumquats") might be tested in the context of a MUC is not en-
as well as names. If an entity is mentioned sev- tirely clear. For one thing, most sites spent several
eral times, possibly using descriptions or differ- days just studying the scenario description and
ent forms of a name, these need to be identified annotated corpus, in order to understand tile sce-
together; there should be only one template ele- nario definition, before coding began. Perhaps a
ment for each entity in an article. Consequently, micro-MUC 9 with an even simpler template struc-
the scores were appreciably lower, ranging across ture, is needed to push the limits of port, ability.
most systems from 65 to 75% in recall, and from Getting systems which can be custonfized by oth-
75% to 85% in precision. The top-scoring sys- ers is also a tall order, given the complexity and
tem had 75% recall, 86% precision. Systems did variety of knowledge sources needed for a typical
particularly poorly in identifying descriptions; the MUC information extraction task.
highest-scoring system had 38% recall and 51% With respect to performance, tile bunching of
precision for descriptions. scores suggests that m a n y sites were able to solve a
There seemed general agreement that having common set of "easy" problems, but were stymied
prepared code for template elements in advance in processing messages which involved "hard"
did make it easier to port a system to a new see- problems. Whether this is true, and just what
nario in a few weeks. This factor, and the room the hard problems are, will require more extensive
that exists for improvement in performance, sug- analysis of the results of MUC-6. Are the short-
gest that including this task in a future MUC may comings due primarily to a lack of coverage in the
be worthwhile. basic patterns, to a lack of background knowledge
The goal for s c e n a r i o t e m p l a t e s mini- in the domain, to failures in coreference, or some-
MUC - - was to demonstrate that effective infor- thing else? We. may hope that the failings are
mation extraction systems could be created in a primarily in one area, so that we may concentrate
few weeks. This too was successful. Although it is our energies there, but more likely the failings will
difficult to meaningfully compare results on differ- be in m a n y areas, and broad improvements in ex-
ent scenarios, the scores obtained by most systems traction engines will be needed to improve perfor-
after a few weeks (40% to 50% recall, 60% to 70% mance.
precision) were comparable to the best scores ob-
tained in prior MUCs. The highest performance 9a term suggested by George Krupka
zycnzj.com/http://www.zycnzj.com/
470
6. zycnzj.com/ www.zycnzj.com
Pushing improvements in the underlying tech- refl~rences to the O R G A N I Z A T I O N template for
nology was one of tlm goals of SemEval and its the organization involved, and the IN_AND OUT
current survivor, eoreference.. Much of tile en- template for the activity involving that post (if
ergy for the current round, however, went into an article describes a person leaving and a per-
honing the definition of the task. Philosol)hers son start;ing the same job, there will be two
of language have been arguing over reference and IN_AND_OUT templates). The IN_AND_OUT
coreferencc for centuries, so we should not have template contains references to the tmnt)lates fl)r
been surprised that it would t)e so hard to pre- the P E R S O N and tbr the ORGANIZATI()N from
pare a precise and consistent definition. Addi- which the person came (if he/she is starting a
tional work on the definition will he necessary, new job). The P E R S O N and O R G A N I Z A T I O N
and it may be necessary to narrow the task fllr- templates are the "temt)late element" templates,
ther. Despite these distractions, a few interesting which are invariant across scenarios.
early results were ol)tained regarding eoreference
methods; we may hot)e that, once the task specifi-
cation settles down, the availability of coreference-
aimotated corpora and the chance for glory ill fltr-
ther evaluations will ein'ourage more work in this
area.
Appendix: Sample Scenario
Template
Shown below is a set of templates for the MUC-
6 scenario template task. Tile scenario involved
changes in corporate executive management per-
sonnel. ~br the text;
McCann has initiated a new so-called
global collaborative system, (:omposed
of world-wide account directors paired
with creative partners. In addition, P(>
ter Kim was hired from W P P Grout)'s .I.
Walter T h o m p s o n last; Septenfl)er as vice
chairman, chief strategy officer, worhl-
wide.
the following templates were to be generated:
<SUCCESSION_EVENT-9402240133-3> :=
SUCCESSION_ORG : <ORGANIZATION-9402240133-1>
POST: " v i c e chairman, c h i e f s t r a t e g y
officer, world-wide"
I N _ A N D _ O U T : < I N _ A N D _ O U T - 9 4 0 2 2 4 0 i33~5>
VACANCY_REASON : OTH_UNK
< IN_AND_OUT-9402240133-5> :=
IO_PERSON : <PERSON-9402240133-5>
N E W _ S T A T U S : IN
O N _ T H E _ J O B : YES
OTHER_ORG : <ORGANIZATION-9402240133-8>
R E L _ O T H E R ORG : O U T S I D E _ O R G
< O R G A N I Z A T I O N - 9 4 0 2 2 4 0 1 3 3 - i> :=
O R G _ N A M E : "McCann"
ORG_TYPE : COMPANY
<ORGANIZATION-9402240133-8> :=
ORG_NAME: "J. W a l t e r T h o m p s o n "
ORG_TYPE : COMPANY
<PERSON-9402240133-5> :=
P E R NAME: "Peter Kim"
Although we cannot explain al] tile details of
the template here, a few highlights shouht be
noted. For each executive post; one generates a
S U C C E S S I O N _ E V E N T template, which contains
zycnzj.com/http://www.zycnzj.com/
471