SlideShare a Scribd company logo
1 of 6
Download to read offline
zycnzj.com/ www.zycnzj.com




                    Message         U n d e r s t a n d i n g C o n f e r e n c e - 6:
                                        A Brief History
                 Ralph Grishman                                   Beth Sundheim
             Dept. of Computer Science                      Naval Command, Control and
                New York University                           Ocean Surveillance Center
              715 Broadway, 7th Floor                      Research, Development, Test and
             New York, NY 10003, USA                         Evaluation Division (NRaD)
               grishman@c s. nyu. edu                                Code 44208
                                                                 53140 Gatchell Road
                                                           San Diego, CMifornia 92152-7420
                                                               sundheim@poj ke. nosc. mil
                    Abstract                              is then evaluated against a manually-prepared an-
                                                          swer key.
     We have recently completed the sixth
                                                             The MUCs are remarkable in part because of
     in a series of "Message Understanding
                                                          the degree to which these evaluations have defined
     Conferences" which are designed to pro-
                                                          a prograin of research and development. DARPA
     mote and evaluate research in informa-
                                                          has a number of information science and technol-
     tion extraction. MUC-6 introduced sev-
                                                          ogy programs which are driven in large part, by
     eral innovations over prior MUCs, most
                                                          regular evaluations. The MUCs are notable, how-
     notably in the range of different tasks for
                                                          ever, in that they in large part have shaped the
     which evaluations were conducted. We
                                                          research program in information extraction and
     describe some of the motivations for the
                                                          brought it to its current s t a t e }
     new format and briefly discuss some of
     the results of the evaluations.
                                                          2    Early History
1    T h e M U C Evaluations                              MUC-1 (1987) was basically exploratory; each
We have just completed the sixth in a series of           group designed its own format for recording the
Message Understanding Conferences, which have             information in the document, and there was no
been organized by NRAD, the R D T & E division of         formal evaluation. By MUC-2 (1989), the task
the Naval Command, Control and Ocean Surveil-             had crystalized as one of template filling. One re-
lance Center (formerly NOSC, the Naval Ocean              ceives a description of a class of events to be iden-
Systems Center) with the support of DARPA,                tiffed in the text; for each of these events one must
the Defense Advanced Research Projects Agency.            fill a template with information about the event.
This paper looks briefly at the history of these          The template has slots for information about the
Conferences and then examines the considerations          event, such as the type of event, the agent, the
which led to the structure of MUC-6}                      time and place, the effect, etc. For MUC-2, the
   The Message Understanding Conferences were             template had 10 slots. Both MUC-1 and MUC-
initiated by NOSC to assess and to foster research        2 involved sanitized forms of military messages
on the a u t o m a t e d analysis of military messages    about naval sightings and engagements.
containing textual information. Although called               The second MUC also worked out the details of
"conferences", the distinguishing characteristic of       the primary evaluation measures, recall and pre-
the MUCs are not the conferences themselves,              cision. To present it in simplest terms, suppose
but the evaluations to which participants must            the answer key has Nke~ filled slots; and that a
submit in order to be permitted to attend the             system fills Neor,.~t slots correctly and Nin~or,,~t
conference. For each MUC, participating groups            incorrectly (with some other slots possibly left un-
have been given sample messages and instructions          filled). Then
on the type of information to be extracted, and
have developed a system to process such messages.                                 Ncorrect
                                                                    recall   -
Then, shortly before the conference, participants                                  Nkey
are given a set of test messages to be run through
their system (without making any changes to the               2There were, however, a number of individual re-
system); the output of each participant's system          scm'eh efforts in information extraction underway be-
                                                          [bre the first MUC, including the work on information
    1The full proceedings of the conference are to be     formatting of medieM narrative by Sager at New York
distributed by Morgan Kaufmann Publishers, San Ma-        University; the formatting of naval equipment failure
teo, California; earlier MUC proeeedings~ for MUC-3,      reports by Marsh at the Naval Research Laboratory;
4, and 5, are also available from Morgan Kaufmann.        and the DBG work by Logieon for RADC.

                              zycnzj.com/http://www.zycnzj.com/
                                                    466
zycnzj.com/ www.zycnzj.com




        precision      =
                                    Nco,,;,.ect                  Each of these can been seen in part as a reaction
                            Ncorrect + Nincorrect                to the trends in the prior MUCs. The MUC-5
                                                                 tasks, in particular, had been quite complex and
For MUC-3 (1991), tile task shifted to reports
                                                                 a great effort had been invested by the government
of terrorist events ill Central and South Amer-
                                                                 in preparing the training and test d a t a and by the
ica, as reported in articles provided by the For-
                                                                 participants in adapting their systems for these
eign Broadcast Information Service, and the tem-
                                                                 tasks. Most participants worked on the tasks for
plate becmne somewhat more complex (18 slots).                   6 months; a few (the Tipster contractors) had
This same task was used for MUC-4 (1992), with a                 been at work on the tasks tbr consi(lerably longer.
further small increase in template complexity (24
                                                                 While the performance of solne systems was quite
slots).
                                                                 impressive (the best got 57% recall, 64% precision
    MUC-5 (1993), which was conducted as part of                 overall, with 73% recall and 74% t)recision on the
the Tipster program, a represented a substantial                 4 "(:or(;" template types), tile question naturally
fllrther j u m p in task complexity. Two tasks were              arose as to whether there were m a n y apl)lieations
involved, international joint ventures and elec-                 tbr which art investment of one or several develop-
tronic circuit fabrication, in two hmgnages, En-                 ers over half->year (or more) could be justified.
glish and Japanese. The joint venture task re-                      Furthermore, while so much effort had been ex-
quired 11 templates with a total of 47 slots for                 pended, a large portion was specific to tire partic-
the output        double tile number of slots defined            ular tasks. It wasn't clear whether much progress
for M U C - 4      and the task documentation was                was being made on the underlying technologies
over 40 pages long.
                                                                 which would be needed for hetter understanding.
    One innovation of MUC-5 was the use of a                        To address these goals, the meeting formulated
nested template structure. In earlier MUCs, each
                                                                 an ambitious menu of tasks for MUC-6, with the
event had been represented as a single temi)late                 idea that individual participants could choose a
 • in effect, a single record in a data l)ase, with a            subset of these tasks. We consider the three goals
large nuinber of attributes. This format proved
                                                                 in the three sections below, and describe the tasks
awkward when an event had several participmlts                   which were developed to address each goal.
(e.g., several victims of a terrorist attack) and one
wanted to record a set of facts about each partic-
ipant. This sort of information (:ould be ranch                  4    Short-term subtasks
more easily recorded in the hierarchical structure               The first goal was to identit~y, from the compo-
introduced for MUC-5, in which there was a single                nent technologies being developed for information
template for an event, which pointed to a list of                extraction, flmctions which would be of 1)ractical
temI(lates, one for each particii)mlt in tile event;. 4          use, would be largely domain indet)endent, and
                                                                 couhl in the near term be performed automatically
3     MUC-6:         initial goals                               with high ac('uracy. To meet this goal the con>
                                                                 mittce developed the "named entity" task, which
1)ARI)A convened a meeting of Tipster partici-                   t(asically involves identifying the names of all the
pants and government representatives in Decca>                   people, organizations, and geographic locations in
bet' 1993 to define goals and tasks tot M U C - 6 )              a text.
Among the goals which were identified were                          The final task specification, which also involved
    • demonstrating taskqndependent component                    time, currency, and percentage expressions, used
      technologies of information extraction which               SGML m a r k u p to identify the names in a text.
      would be immediately useflfl                               Figure 1 shows a sample sentence with named en-
                                                                 tity annotations. The tag ENAMEX ("entity name
    • encouraging work to make information ex-
                                                                 expression") is used for both people and organiza-
      tractioil systems in<)re portable
                                                                 tion names; the tag NUNEX ( " n u m e r i c expression")
    • encouraging work on "deeper understanding"                 is used for currency and I)ercentages.

    aTipster is a U.S. G o v e r m n e n t program of research   5    Portability
and development in the areas of inibrmation retrieval
and information extraction.                                      The second goal was 1;o focus on portability in
   4In fact the MUC-5 structure wa~smuch (nor(; com-             the inibrmation extraction task       the ability to
plex, because there were separate temt)lates for prod-           rapidly retarget a system to extract; information
ucts, time, activities of organizations, etc.
                                                                 about a different class of events. The comnfit-
   '~The representatives of the resear(:h community
                                                                 tee felt that it was important to demonstrate that
were Jim Cowie, lS(.alph Grishman (commit;tee chair),
Jerry Hobbs, Paul Jacobs, Lea Schubert, Carl Weir,               useful extraction systems eouht be created in a
and Ralph Weischedel. The government people at-                  few weeks. To meet this goal, we decided that
tending wcre George Doddington, Donna Harman,                    the infbrmation extraction task for MUC-6 wouhl
Boyan Onyshkevych, John Prangc, Bill Schultheis,                 have to involve a relatively simple template, more
and Beth Sundheim.                                               like MUC-2 than MUC-5; this was duhbed "mini-

                                          zycnzj.com/http://www.zycnzj.com/
                                                           467
zycnzj.com/ www.zycnzj.com




Mr.  <ENAMEX TYPE="PERSON">Dooner</ENAMEX> met with <ENAMEX:TYPE="PERSON">Martin
Puris</ENAMEX>, president and chief executive officer of < E N A M E X
TYPE="ORGANIZATION">Ammirati & Puris</ENAMEX>, about <ENAMEX
TYPE="ORGANIZATION">McCann</ENAMEX>~s  acquiring the agency with billings of <NUMEX
TYPE="MONEY">$400 million</NUMEX>, but nothing has materialized.

                                Figure 1: Sample named entit;y annotation.


MUC". In keeping with |;he hierarchical tem-                on a description of a particular (:lass of events
plate structure introduced in MUC-5, it was envi-           (a "scenario") was called the "scenario template"
sioned |;hat the inini-MUC would have an event-             task. A sample scenario template is shown in the
level template pointing to templates representing           appendix.
|;he partieitmnts in the event (people, orgmfiza-
tions, products, e.tc.), me(liated perhaps by a "re-        6     Measures         of deep     understanding
lational" level template.
   To further increase portability, a proposal was          Another concern which was noted about the
made to standardize the lowest-level tenlplates             MUCs is that tile systenls we.re tending to-
(for peoph',, orgaIfizations, etc.), since these basic      wards relatively shallow understanding techlfiques
(:lasses are involved in a wide variety of actions. In      (based IIrimarily on local pa.ttern inatching), and
this way, MUC participants could develop code for           that not enough work was being done to build
these low-level telnplates once, and then use them          up the mechanisms needed for deeper understand-
with m a n y different types of events. These low-          ing. Therefore, tile committee, with strong en-
level t;emptates were named "telnplate elements".           couragement front I)AII.PA, included three MUC
   As the specification finally deveh)ped, tit(; rein-      tasks which were intended to measure, aspex:ts of
plate element for orgalfizations had six slots, for         the internal processing of an inforlnation extra(:-
the inaximal organization nalne, any aliases, the           lion or hmguage understanding systenL These
type, a descriptive noun phrase, the locale (inost          three tasks, which were collectively called Se-
specific location), and country. Slots are tilh,d           mEwfl ("Senmntic Ewfluation") were:
only if inforlnation is explicitly given in the text            • C o r e f e r e n c e : the systent would have to
(or, ill the ease of the country, can be inDrred                  mark coreferential noun t)hrases (the initial
Doln an explicit locale). The text                                SlmCification envisioned marking set-subsel;
     We are striving to have a strong re-                         and part-whole relations, ill addition to iden-
     newed creative partnership with Coca-                        tity relations)
     Cola," Mr. Dooner says. However, o(lds                     • Word sense disambiguation:            for each
     of that hapt;ening are slim since word                       ope.n (:lass word (noun, verb, a,djective, ad-
     from Coke headquarters in Atlanta is                         verb) in the text, the systein would have to
     that...                                                      determine its sense using the Wordlmt clas-
wouht yiehl an organization telnplate elenmnt                     s|Ileal|on (its "synset", in Wordnet termii~of
with live of these six slots filled:                              ogy)
<0RGANIZATION-9402240133-5>          :=                         • Predicate-argument        s t r u c t u r e : the sys-
     ORG NAME: "Coca-Cola"                                        tem wouhl have to create a tree interrelating
     ORG ALIAS: "Coke"                                            the constituellts of the sentence, using sonm
     ORG TYPE: COMPANY                                            set of gralnma.tical flnmtional relations
     ORG_LOCALE: Atlanta CITY                               The committee recognized that, in seh;eting sneh
     ORG COUNTRY: United States                             internal measures, it, was inaking sortie presumI)
(the first line identities this as organization tenl-       tion regarding the structures and decisions which
plate 5 from article 9402240]33).                           an analyzer should make in understanding a doc-
   Ever on the lookout for additional ewfluation            llmellt. Not everyone would share these pre, sump-
measm'es, the committee decide, d to nlake the cre-         lions, lint participants in the next MU(J would
ation of telnI)late eh,ments tbr all the people and         be free 1;o enter the infornlation extraction evalu-
organizations in a text a separate MUC task. lake           ation and skip some or all of these internal ewdua-
the named entil;y task, this was also seen as a             Lions. Language understanding technology might
potential demonstration of the ability of systelns          develop in ways very diIii?rent from those imagined
1;o pertbrm a useflfl, relatively dolnain indepen-          by the committee, and these internal evaluations
dent task with near-term extraction te(:hnoh)gy             might turn ollt t() t)e irrelevant distractions. How-
(although it was recognized as being more dilli-            ever, froln the current perslmctive of tnost of the
cult than named entity, since it required merging           eolnmittec, @ese seenmd fairly ])asic aspects of
information from several places in the text). The           unde, rstanding, and so an experinmnt in evahlat-
old-style MUC information extraction task, based            ing them (and encouraging improvem(mt in them)

                               zycnzj.com/http://www.zycnzj.com/
                                                     4:68
zycnzj.com/ www.zycnzj.com




w o u l d I)e worl;hwhil(~.                                                         Round         2: a n n o t a t i o n
                                                                                    T h e n e x t st;(; 1) was the IWel)axal;hm of a substa.nt;ia.1
7      Preparation                     process                                      Lra.illing corpllS for LII(~ l;wo novel t,asks which re-
                                                                                    n m i n e d ( n m u e d Clll;il;y &lid COI'(~,f(~,FO,1IC(Q. S R A C o l
Round         1: R e s o l u t i o n     of SemEval                                 l)orat:ion k i n d l y p r o v i d e d tools which a i d e d in t;he
T h e c o m m i t t e e , h a d l)ropos(;(t a ve.ry anll)itious                     a n n o l ; a t i o n p r o c e s s . A g a i n a sl;alwa.rt grtml) of vt)l-
I ) r o g r m n of cvahu~l;ions. Wc now h a d to r(xhlce.                           uui;e(w a.nn()i;alx)rs was assenfl)led; 7 e a c h was 1)to -
t h e s e I)roi)osals to (let;ailed spe.cifi(:ations. The.                          vide(l w i t h 25 m't;i(:lcs f r o m 1;h('. W a l l S t r e e t .]our-
first s t e p was t;o (lo s o m e ma,mlal te.xl; anuol:a.--                         na.l. There. was SOlUe o v e r l a p b(!Lween t;hc arLi(:les
l i o n for t h e f o r e ~,asks               n a m e d em;ity mM the              a s s i g n e d , s() t,haL we c o u l d IIIO&Slll'(! ~;}1(~.c o n s i s t e n c y
Selnt,;val t r i a d             whi(:h were quit(: (tifii!r(!nt from               of a.mloi;m;ion /w.|:weeu silx~s. T h i s a m l o i , a t i o n w~s
w h a t h a d be(m l;rie(l before, lh'M! sp(~(:ifi(:ations                          (lone. in I.he w i n t e r o[ 1994-95.
were p r e p a r e d for ca(:h t a s k , a n d in the sl)ring of                         A m a j o r role o[ the. mmol;aLion l)ro(:e.ss was Lo
] 994 a g r o u I) of vohmt(~ers (most;ly vel;(n:ans ()f ear-.                      i(lemify and res()lv(~ l)r(fl)h!ms wil;h l;he t a s k Sl)(X>
lier MU(Js) annol:~mxl a shorl: newst)~p(w m'tM(',                                  ifi(:a.tions. For na.nied cnl;iifies, this was rel~tl;ively
using ('.ach set of specifi(:ations.                                                st, rtdght, forwar([. For COI'(~[(~I'(;I/(',(;, it p r o v e d r(',-
     P r o t ) l e m s a r o s e w i t h ea.(:h of t;he S e m E v a l tasks.        m a r k a t ) l y (lifli(:ult to f()rmutat;e guitl(,lines which
                                                                                    were r e a s o n a l ) l y comI/lel;(~ a,nd <:onsist, ent.. s
    * F o r corefcren('e., ther(', were p r o b l e m s i(hull;i[y-
      ing i)art-whoh~ a n d sei;-sul)s(¢ rela.tions, mM                             RomM          3: d r y     rml
      d i s t i n g u i s h i n g the, two; a decision wa.s lm;er                   ( ) n e e the t;ask sl)e(:ifica.l;ions s e e m e d r(~asonably
      m a d e to l i m i t ourselv(;s I;() i(lenLi(;y rela.I;ions.                  stM)l(b N l b d ) ()rg;ufiz(~(l a "(lry run"                    a full-s(:al(~
    ® b b r sens(' I:~gging, l;h(; ~l.llllOl;tl,  t,()l'S forum that,               r(~hearsal for M U C - 6 , I)ul; w i t h all result:s r('4)ori;ed
      in s o m e cases W o r d n ( , t m a d e v e r y [ine dis-                    a.nouymously. T h e d r y r u n Ix)ok t)l;u:e in A p r i l
                                                                                     1995, wil;h a s(:enario i u v o l v i n g l a b o r union (:()n.
      1;incl;ions a,nd thai; m a k i n g l;hese (list,incti(ms
                                                                                    l,ra,c.t; n(~gotia.l:i(ms. ()f 0 m sil;es whi(:h we.re in-
      c o n s i s t e n t l y ill l;agging was v e r y ditticulI;.
                                                                                    v o l v e d in t;he a n n o t ; a t i o n l/r()(:('~s,q, t;en 1)arl:i(:ipatx~(l
    e F o r p r e d i c a t e - a r g u m e a l t sl;ru(;l;llr(',, pracl;ically     in l h e d r y run. R e s u l t s of t;h(~ d r y r u n were r(>
      e v e r y new CoIIS[;Ill(;l; 1)(;y()lI(l s i m p l e clauses                  l)()rWxl n.I, l;he Tit)sl:er I)hase II 12--mout;h m e ( M u g
      and n o u n l)hrases r;tise(l n e w issues which had                          in M a y 1995.
      I;o t)e toilet:lively r(:solve(l.
                                                                                    8      The        formal           evaluation
    Beyon(l th(;se in(lividuM t)rolflenls, il; was fell:
l;hal; l;he m e n u was s i m p l y (;oo anfl)il, ious, m M l;hal;                  The MUC6                  f o r m a l ewflu;ttion was /mhl in
w('. w o u l d do t)('.l:t('x by (:on(:entrat, ing on out; (',le-                   ,~{(q)l:emt)ex 1995. T h e s(:(;nario (l(~finil;ion w;L,q dis-
menl: of the Sem(;v;fl l,riad for M U C - 6 ; a t a. me('.l;-                       t,ribuIxxt at, t;he t)egimling ()[' S ( q ) t e m b e r I l;he test
ing hehl in .hllm 1994, a d e c i s i o n was mad(; to                              d a t a was disiaibut, cd four we('.ks late.r, w i t h re
go w i t h coref('xea,(:('.. In i/arl;, this r(~tl(w.l;est a feel-                  sult;s d u e by (,he end ()[' th(; w('.ek. T h e ,qcena.rio
ing t h a t the t)rol)lems wi@ Lh(', (:()refl',ren(:(~ Sl)(X>                       involv(M (;h;l,II~O,S ill COI'|)OF;I,I;(~ (LK(~CIII;iv(; II],%II;/,~C-
ili(:a.I;ion w('.re l:he mosl; m n ( m a b l e l:o soluli(m, lilt,                  m('.n(; p('a,~onn(~l. The. (;valua.1;i(m reel; m m l y ()I 1;t1('.
also re.fle(:i;ed a. (:onvicl;ion I;hal; (:or(ff('r(m(:(~ idea>                     f~oals which h a d /)(~en set, 1)y th('~ iniLial p l a n n i n g
t:ilication h a d 1)een, &nd w o u l d re, m a i n , c r i t i c a l                (:onfer(mt:e. in l)e(:emlmr ()f 1993.
1;o success ill inforina.t;iou cxl;r~mi;ion, au(1 st) it                                 T h e r e were (;va]u;Lti(ms for [our t,asks: 1HIIII(RI
wgts [IIlpor[;~l,ll[;     1;o (?llC()llrtl,~(~   a, dvtl~ltc(;s   in   (:or(',[ k   entit;y, (:orel'('.re.n(:e, 1;eml)lat(, c,lt!inenI;, }l,ll(t s(;c--
m(;nc(;,        tin contrasl;, mosl; (;xt, rat:l;ion sysl;ems                       n m i o I;e,  mt~lm;(u T t m r e w('r(; 16 t)m'ti(;ipmfl;s; 11.5
did nol; buil(t fltll t)redi(:ate-atrgument sl;ru(:l;ures,                          1)arti(:it)al;e(l in the nmne(l ent, it,y t a s k , 7 in (',oref-
a n d w o r d - s e n s e (lismnbigual;ion p l a y e d a relal;iv('ly               (~l'Oll(~(~,, ] 1 ill t(',ml)lat;(; elemenl;, an(l 9 in s(:enari()
                                                                                          ,
stnall role ill exl;ra(:l;ion ( p a r t i c u l a r l y since (;xl;l';t(>           l,(;mi)lal;(,,.
l;ion sysl;ems o l ) e r a t e d in a n a r r o w d o m a i n ) .                        Name(l         e n i ; i t y was inl;(mdcd to b(; a siml)h~
     'Phe (:or('~h'a'(;n(:('~ t a s k , like. t h e n a m ( x l entil;y             t a s k on whi(:h syst, e m s coul(t (lernoustrat, e a high
l;ask, was a.nnotal;ed u s i n g S G M I , n()tal;i()tl. A                          level of 1)(!rforumn(:e ... high e n o u g h for imme(li-
C{]REF t a g has mt ID ai;l;ri|)ul;(' whi(:h i(lenl;ifies l;he                      m;e use. O u r su(:(;(;ss iu I;his t,a s k (~x(:(;(~(le(l our
t a g g e d noult 1)hrase or l)ron(mn, ll; tn;ty also ha.vc
a.n at,l;ril)ut;(' of t h e [orm REF--n, w h i c h indi(:al,es                           >l'he annol;;)A;ion groups were from BBN, Brall(t(fis
                                                                                    Univ., t~he Univ. of Durham, Lo(:kheed-Marl;in, New
thai; this lfln'ase is (:or(,fe,r(mtiM w i t h I;he 1)hrasc
                                                                                    Mexico Sl;ai;e Univ., N l b d ) , New York Univ., PRC,
wit;h I1) n. Figure, 2 shows an (;x(:('rt)I; fl'om ; m                              l;he, Univ. of l)(mnsylwmia, SAIC (San /)iego), SRA,
m ' t M e , ann(/l;al;c(t [or (;orefereal(;e. (;                                    SR[, the Univ. of Shefliehl, SouLhe, rn Metlmdisl; Univ.,
                                                                                    mr(1 Ultisys.
    6'The TYPE mM M]~N;tl;l;I'il)uLes which a p p e a r in l,he
                                                                                        SAs exl)e,rienced (:Oml)ut~tional linguists, we 1)rol)-
;tctmd annot;al;ion have been o m i t t e d here fin the s~tke                      ably should ha,re kuown 1)el;l;(',r l;han to l;hink this wa.s
of readM)ilil;y.                                                                    an easy t~ask.

                                                     zycnzj.com/http://www.zycnzj.com/
                                                                              469
zycnzj.com/ www.zycnzj.com




Maybe <COREF ID="136" REF="I34">he</CSREF>'II even leave something from <COREF
ID="138" REF="I39"><COREF ID="137" REF="I36">his</COREF> office</COREF> for <CSREF
ID="I40" REF="91">Mr.   Dooner</COREF>.   Perhaps <COREF ID="144">a framed page from
the New York Times, dated Dec.    8, 1987, showing a year-end chart of the stock market
crash earlier that year</COREF>.    <COREF ID="I41" REF="I37">Mr.   James</COREF> says
<COREF ID="142" REF="I41">he</COREF> framed <COREF ID="143" REF="I44"
STATUS="OPT">it</COREF> and kept <COREF ID="145" REF="I44">it</COREF> by <COREF
ID="146" REF="I42">his</COREF> desk as a "personal reminder.     It can all be gone like
that."

                                   Figure 2: Sample coreference annotation.


expectations. The majority of sites had recall              overall was 47% recall and 70% precision.
and precision over 90%; the highest-scoring sys-               One can observe an increasing convergence of
tem had a recall of 96% and a precision of 97%.             methods tbr information extraction.           Most of
Although one must keep in mind the somewhat                 the systems participating in MUC-6 employed a
limited range of texts in the test set (all are from        cascade of finite-state pattern recognizers, with
the Wall Street Journal, in particular), the re-            the earlier pattern sets recognizing entities, and
sults are excellent. A couple of these systems have         the later sets recognizing scenario-specific pat-
been commercialized, and several are being incor-           terns. This convergence may be one reason for
porated into government text-processing systems.            tile bunching of scores for this task -- most sys-
Given this level of performance, there is probably          tems fell in a rather narrow range in both recall
little point in repeating this task with the same           and precision.
ground rules in a future MUC (although there                   The results of this MUC provide valuable pos-
might be interest in processing monoease text and           itive testimony on behalf of information extra(>
in performing comparable tasks oil a more varied            tion, but further improvement in both portability
corpus and for languages other than English).               and performance is needed tbr m a n y applications.
   The t e m p l a t e e l e m e n t task, while superfi-   With respect to port~bility, custoiners would like
cially similar to named entities - ~ it is also based       to have systems which can be ported in a t'ew
on identifying people and organizations ~ is sig-           hours, or at most a few days, by someone with
nificantly more difficult. One has to identify de-          less expertise than a system developer. How this
scriptions of entities ("a distributor of kumquats")        might be tested in the context of a MUC is not en-
as well as names. If an entity is mentioned sev-            tirely clear. For one thing, most sites spent several
eral times, possibly using descriptions or differ-          days just studying the scenario description and
ent forms of a name, these need to be identified            annotated corpus, in order to understand tile sce-
together; there should be only one template ele-            nario definition, before coding began. Perhaps a
ment for each entity in an article. Consequently,           micro-MUC 9 with an even simpler template struc-
the scores were appreciably lower, ranging across           ture, is needed to push the limits of port, ability.
most systems from 65 to 75% in recall, and from             Getting systems which can be custonfized by oth-
75% to 85% in precision. The top-scoring sys-               ers is also a tall order, given the complexity and
tem had 75% recall, 86% precision. Systems did              variety of knowledge sources needed for a typical
particularly poorly in identifying descriptions; the        MUC information extraction task.
highest-scoring system had 38% recall and 51%                  With respect to performance, tile bunching of
precision for descriptions.                                 scores suggests that m a n y sites were able to solve a
   There seemed general agreement that having               common set of "easy" problems, but were stymied
prepared code for template elements in advance              in processing messages which involved "hard"
did make it easier to port a system to a new see-           problems. Whether this is true, and just what
nario in a few weeks. This factor, and the room             the hard problems are, will require more extensive
that exists for improvement in performance, sug-            analysis of the results of MUC-6. Are the short-
gest that including this task in a future MUC may           comings due primarily to a lack of coverage in the
be worthwhile.                                              basic patterns, to a lack of background knowledge
   The goal for s c e n a r i o t e m p l a t e s  mini-    in the domain, to failures in coreference, or some-
MUC - - was to demonstrate that effective infor-            thing else? We. may hope that the failings are
mation extraction systems could be created in a             primarily in one area, so that we may concentrate
few weeks. This too was successful. Although it is          our energies there, but more likely the failings will
difficult to meaningfully compare results on differ-        be in m a n y areas, and broad improvements in ex-
ent scenarios, the scores obtained by most systems          traction engines will be needed to improve perfor-
after a few weeks (40% to 50% recall, 60% to 70%            mance.
precision) were comparable to the best scores ob-
tained in prior MUCs. The highest performance                  9a term suggested by George Krupka

                                 zycnzj.com/http://www.zycnzj.com/
                                                     470
zycnzj.com/ www.zycnzj.com




   Pushing improvements in the underlying tech-                                    refl~rences to the O R G A N I Z A T I O N template for
nology was one of tlm goals of SemEval and its                                     the organization involved, and the IN_AND OUT
current survivor, eoreference.. Much of tile en-                                   template for the activity involving that post (if
ergy for the current round, however, went into                                     an article describes a person leaving and a per-
honing the definition of the task. Philosol)hers                                   son start;ing the same job, there will be two
of language have been arguing over reference and                                   IN_AND_OUT templates).            The IN_AND_OUT
coreferencc for centuries, so we should not have                                   template contains references to the tmnt)lates fl)r
been surprised that it would t)e so hard to pre-                                   the P E R S O N and tbr the ORGANIZATI()N from
pare a precise and consistent definition. Addi-                                    which the person came (if he/she is starting a
tional work on the definition will he necessary,                                   new job). The P E R S O N and O R G A N I Z A T I O N
and it may be necessary to narrow the task fllr-                                   templates are the "temt)late element" templates,
ther. Despite these distractions, a few interesting                                which are invariant across scenarios.
early results were ol)tained regarding eoreference
methods; we may hot)e that, once the task specifi-
cation settles down, the availability of coreference-
aimotated corpora and the chance for glory ill fltr-
ther evaluations will ein'ourage more work in this
area.

Appendix:               Sample           Scenario
Template
Shown below is a set of templates for the MUC-
6 scenario template task. Tile scenario involved
changes in corporate executive management per-
sonnel. ~br the text;
      McCann has initiated a new so-called
      global collaborative system, (:omposed
      of world-wide account directors paired
      with creative partners. In addition, P(>
      ter Kim was hired from W P P Grout)'s .I.
      Walter T h o m p s o n last; Septenfl)er as vice
      chairman, chief strategy officer, worhl-
      wide.
the following templates were to be generated:
<SUCCESSION_EVENT-9402240133-3> :=
    SUCCESSION_ORG : <ORGANIZATION-9402240133-1>
    POST: " v i c e chairman, c h i e f s t r a t e g y
                       officer,             world-wide"
        I N _ A N D _ O U T : < I N _ A N D _ O U T - 9 4 0 2 2 4 0 i33~5>
        VACANCY_REASON : OTH_UNK
< IN_AND_OUT-9402240133-5>                         :=
        IO_PERSON : <PERSON-9402240133-5>
        N E W _ S T A T U S : IN
        O N _ T H E _ J O B : YES
        OTHER_ORG : <ORGANIZATION-9402240133-8>
        R E L _ O T H E R ORG : O U T S I D E _ O R G
< O R G A N I Z A T I O N - 9 4 0 2 2 4 0 1 3 3 - i> :=
        O R G _ N A M E : "McCann"
        ORG_TYPE : COMPANY
<ORGANIZATION-9402240133-8>                            :=
        ORG_NAME: "J. W a l t e r T h o m p s o n "
        ORG_TYPE : COMPANY
<PERSON-9402240133-5>                         :=
        P E R NAME: "Peter Kim"

Although we cannot explain al] tile details of
the template here, a few highlights shouht be
noted. For each executive post; one generates a
S U C C E S S I O N _ E V E N T template, which contains

                                                   zycnzj.com/http://www.zycnzj.com/
                                                                             471

More Related Content

Viewers also liked

National endowment for the arts a history 1965 2008
National endowment for the arts  a history 1965 2008National endowment for the arts  a history 1965 2008
National endowment for the arts a history 1965 2008sugeladi
 
Intertestamental history
Intertestamental historyIntertestamental history
Intertestamental historysugeladi
 
Oral history guidelines
Oral history guidelinesOral history guidelines
Oral history guidelinessugeladi
 
2013年工作总结
2013年工作总结2013年工作总结
2013年工作总结Haoyang Xu
 
万达期货早报:豆类与油脂
万达期货早报:豆类与油脂万达期货早报:豆类与油脂
万达期货早报:豆类与油脂sugeladi
 
Mit medical department pediatrics history form mit医务中心儿科病史 ...
Mit medical department pediatrics history form mit医务中心儿科病史 ...Mit medical department pediatrics history form mit医务中心儿科病史 ...
Mit medical department pediatrics history form mit医务中心儿科病史 ...sugeladi
 

Viewers also liked (9)

Caliburn.Micro
Caliburn.MicroCaliburn.Micro
Caliburn.Micro
 
National endowment for the arts a history 1965 2008
National endowment for the arts  a history 1965 2008National endowment for the arts  a history 1965 2008
National endowment for the arts a history 1965 2008
 
Intertestamental history
Intertestamental historyIntertestamental history
Intertestamental history
 
Oral history guidelines
Oral history guidelinesOral history guidelines
Oral history guidelines
 
dsc interview
dsc interviewdsc interview
dsc interview
 
Vidiense Presentation
Vidiense PresentationVidiense Presentation
Vidiense Presentation
 
2013年工作总结
2013年工作总结2013年工作总结
2013年工作总结
 
万达期货早报:豆类与油脂
万达期货早报:豆类与油脂万达期货早报:豆类与油脂
万达期货早报:豆类与油脂
 
Mit medical department pediatrics history form mit医务中心儿科病史 ...
Mit medical department pediatrics history form mit医务中心儿科病史 ...Mit medical department pediatrics history form mit医务中心儿科病史 ...
Mit medical department pediatrics history form mit医务中心儿科病史 ...
 

Similar to Message understanding conference 6 a brief history

00 12-06 the national virtual observatory
00 12-06 the national virtual observatory00 12-06 the national virtual observatory
00 12-06 the national virtual observatorySean Casey, USRA
 
SPC Implementation - Mark Harrison
SPC Implementation - Mark HarrisonSPC Implementation - Mark Harrison
SPC Implementation - Mark HarrisonMark Harrison
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Minor33
 
Tudor Dragea ICONE 23 Final Paper
Tudor Dragea ICONE 23 Final Paper Tudor Dragea ICONE 23 Final Paper
Tudor Dragea ICONE 23 Final Paper Tudor Dragea
 
Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingBarbara Starr
 
Knowledge intensive query Processing
Knowledge intensive query ProcessingKnowledge intensive query Processing
Knowledge intensive query ProcessingBarbara Starr
 
A history of clu
A history of cluA history of clu
A history of clusugeladi
 
Applying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory applicationApplying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory applicationTELKOMNIKA JOURNAL
 
6 An Innovative Distributed Project Control
6 An Innovative Distributed Project Control6 An Innovative Distributed Project Control
6 An Innovative Distributed Project Controlpmb25
 
A Suite Of Tools For Technology Assessment
A Suite Of Tools For Technology AssessmentA Suite Of Tools For Technology Assessment
A Suite Of Tools For Technology Assessmentjbci
 
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docx
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docxYagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docx
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docxjeffevans62972
 
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 Final
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 FinalIEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 Final
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 FinalKim Simpson
 
Arj97 tolentino
Arj97 tolentinoArj97 tolentino
Arj97 tolentinoGBMV
 

Similar to Message understanding conference 6 a brief history (20)

bwr-tt102es
bwr-tt102esbwr-tt102es
bwr-tt102es
 
Network Anaysis_ critical path methods
Network Anaysis_ critical path methodsNetwork Anaysis_ critical path methods
Network Anaysis_ critical path methods
 
00 12-06 the national virtual observatory
00 12-06 the national virtual observatory00 12-06 the national virtual observatory
00 12-06 the national virtual observatory
 
SPC Implementation - Mark Harrison
SPC Implementation - Mark HarrisonSPC Implementation - Mark Harrison
SPC Implementation - Mark Harrison
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...
 
Mullins Sess8 101509
Mullins Sess8 101509Mullins Sess8 101509
Mullins Sess8 101509
 
Tudor Dragea ICONE 23 Final Paper
Tudor Dragea ICONE 23 Final Paper Tudor Dragea ICONE 23 Final Paper
Tudor Dragea ICONE 23 Final Paper
 
Chap14
Chap14Chap14
Chap14
 
Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencing
 
Knowledge intensive query Processing
Knowledge intensive query ProcessingKnowledge intensive query Processing
Knowledge intensive query Processing
 
Chap14
Chap14Chap14
Chap14
 
Work Package 3
Work Package 3Work Package 3
Work Package 3
 
A history of clu
A history of cluA history of clu
A history of clu
 
Use of network scheduling technique
Use of network scheduling technique Use of network scheduling technique
Use of network scheduling technique
 
Applying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory applicationApplying convolutional neural networks for limited-memory application
Applying convolutional neural networks for limited-memory application
 
6 An Innovative Distributed Project Control
6 An Innovative Distributed Project Control6 An Innovative Distributed Project Control
6 An Innovative Distributed Project Control
 
A Suite Of Tools For Technology Assessment
A Suite Of Tools For Technology AssessmentA Suite Of Tools For Technology Assessment
A Suite Of Tools For Technology Assessment
 
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docx
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docxYagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docx
Yagmur Bostanci47 Hackensack Street, East Rutherford, NJ929-22.docx
 
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 Final
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 FinalIEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 Final
IEEE 2012 Big Sky Aerospace Conf Thom McVittie Session 18 Jan 2012 Final
 
Arj97 tolentino
Arj97 tolentinoArj97 tolentino
Arj97 tolentino
 

More from sugeladi

地理学习方法举要
地理学习方法举要地理学习方法举要
地理学习方法举要sugeladi
 
地理学类核心期刊投稿指南
地理学类核心期刊投稿指南地理学类核心期刊投稿指南
地理学类核心期刊投稿指南sugeladi
 
地理选择题的分类和解答方法
地理选择题的分类和解答方法地理选择题的分类和解答方法
地理选择题的分类和解答方法sugeladi
 
地理系2006级师范地理科学专业人才培养方案
地理系2006级师范地理科学专业人才培养方案地理系2006级师范地理科学专业人才培养方案
地理系2006级师范地理科学专业人才培养方案sugeladi
 
地理图表典例的文本来源
地理图表典例的文本来源地理图表典例的文本来源
地理图表典例的文本来源sugeladi
 
地理试题的综合能力和复习解题策略
地理试题的综合能力和复习解题策略地理试题的综合能力和复习解题策略
地理试题的综合能力和复习解题策略sugeladi
 
地理事实新授课教学行为设计初探
地理事实新授课教学行为设计初探地理事实新授课教学行为设计初探
地理事实新授课教学行为设计初探sugeladi
 
地理人文课程
地理人文课程地理人文课程
地理人文课程sugeladi
 
地理人文课程(1)
地理人文课程(1)地理人文课程(1)
地理人文课程(1)sugeladi
 
地理课中存在的几个问题
地理课中存在的几个问题地理课中存在的几个问题
地理课中存在的几个问题sugeladi
 
地理课程计划
地理课程计划地理课程计划
地理课程计划sugeladi
 
地理课程标准解读
地理课程标准解读地理课程标准解读
地理课程标准解读sugeladi
 
地理课程标准(4 3)
地理课程标准(4 3)地理课程标准(4 3)
地理课程标准(4 3)sugeladi
 
地理科学专业人才培养方案(师范)
地理科学专业人才培养方案(师范)地理科学专业人才培养方案(师范)
地理科学专业人才培养方案(师范)sugeladi
 
地理科学专业培养方案
地理科学专业培养方案地理科学专业培养方案
地理科学专业培养方案sugeladi
 
地理科学专业培养方案(070511)
地理科学专业培养方案(070511)地理科学专业培养方案(070511)
地理科学专业培养方案(070511)sugeladi
 
地理科学专业
地理科学专业地理科学专业
地理科学专业sugeladi
 
地理科学专业(本科)教学计划
地理科学专业(本科)教学计划地理科学专业(本科)教学计划
地理科学专业(本科)教学计划sugeladi
 
地理科学专业(S)人才培养方案
地理科学专业(S)人才培养方案地理科学专业(S)人才培养方案
地理科学专业(S)人才培养方案sugeladi
 
地理科学与资源研究所
地理科学与资源研究所地理科学与资源研究所
地理科学与资源研究所sugeladi
 

More from sugeladi (20)

地理学习方法举要
地理学习方法举要地理学习方法举要
地理学习方法举要
 
地理学类核心期刊投稿指南
地理学类核心期刊投稿指南地理学类核心期刊投稿指南
地理学类核心期刊投稿指南
 
地理选择题的分类和解答方法
地理选择题的分类和解答方法地理选择题的分类和解答方法
地理选择题的分类和解答方法
 
地理系2006级师范地理科学专业人才培养方案
地理系2006级师范地理科学专业人才培养方案地理系2006级师范地理科学专业人才培养方案
地理系2006级师范地理科学专业人才培养方案
 
地理图表典例的文本来源
地理图表典例的文本来源地理图表典例的文本来源
地理图表典例的文本来源
 
地理试题的综合能力和复习解题策略
地理试题的综合能力和复习解题策略地理试题的综合能力和复习解题策略
地理试题的综合能力和复习解题策略
 
地理事实新授课教学行为设计初探
地理事实新授课教学行为设计初探地理事实新授课教学行为设计初探
地理事实新授课教学行为设计初探
 
地理人文课程
地理人文课程地理人文课程
地理人文课程
 
地理人文课程(1)
地理人文课程(1)地理人文课程(1)
地理人文课程(1)
 
地理课中存在的几个问题
地理课中存在的几个问题地理课中存在的几个问题
地理课中存在的几个问题
 
地理课程计划
地理课程计划地理课程计划
地理课程计划
 
地理课程标准解读
地理课程标准解读地理课程标准解读
地理课程标准解读
 
地理课程标准(4 3)
地理课程标准(4 3)地理课程标准(4 3)
地理课程标准(4 3)
 
地理科学专业人才培养方案(师范)
地理科学专业人才培养方案(师范)地理科学专业人才培养方案(师范)
地理科学专业人才培养方案(师范)
 
地理科学专业培养方案
地理科学专业培养方案地理科学专业培养方案
地理科学专业培养方案
 
地理科学专业培养方案(070511)
地理科学专业培养方案(070511)地理科学专业培养方案(070511)
地理科学专业培养方案(070511)
 
地理科学专业
地理科学专业地理科学专业
地理科学专业
 
地理科学专业(本科)教学计划
地理科学专业(本科)教学计划地理科学专业(本科)教学计划
地理科学专业(本科)教学计划
 
地理科学专业(S)人才培养方案
地理科学专业(S)人才培养方案地理科学专业(S)人才培养方案
地理科学专业(S)人才培养方案
 
地理科学与资源研究所
地理科学与资源研究所地理科学与资源研究所
地理科学与资源研究所
 

Message understanding conference 6 a brief history

  • 1. zycnzj.com/ www.zycnzj.com Message U n d e r s t a n d i n g C o n f e r e n c e - 6: A Brief History Ralph Grishman Beth Sundheim Dept. of Computer Science Naval Command, Control and New York University Ocean Surveillance Center 715 Broadway, 7th Floor Research, Development, Test and New York, NY 10003, USA Evaluation Division (NRaD) grishman@c s. nyu. edu Code 44208 53140 Gatchell Road San Diego, CMifornia 92152-7420 sundheim@poj ke. nosc. mil Abstract is then evaluated against a manually-prepared an- swer key. We have recently completed the sixth The MUCs are remarkable in part because of in a series of "Message Understanding the degree to which these evaluations have defined Conferences" which are designed to pro- a prograin of research and development. DARPA mote and evaluate research in informa- has a number of information science and technol- tion extraction. MUC-6 introduced sev- ogy programs which are driven in large part, by eral innovations over prior MUCs, most regular evaluations. The MUCs are notable, how- notably in the range of different tasks for ever, in that they in large part have shaped the which evaluations were conducted. We research program in information extraction and describe some of the motivations for the brought it to its current s t a t e } new format and briefly discuss some of the results of the evaluations. 2 Early History 1 T h e M U C Evaluations MUC-1 (1987) was basically exploratory; each We have just completed the sixth in a series of group designed its own format for recording the Message Understanding Conferences, which have information in the document, and there was no been organized by NRAD, the R D T & E division of formal evaluation. By MUC-2 (1989), the task the Naval Command, Control and Ocean Surveil- had crystalized as one of template filling. One re- lance Center (formerly NOSC, the Naval Ocean ceives a description of a class of events to be iden- Systems Center) with the support of DARPA, tiffed in the text; for each of these events one must the Defense Advanced Research Projects Agency. fill a template with information about the event. This paper looks briefly at the history of these The template has slots for information about the Conferences and then examines the considerations event, such as the type of event, the agent, the which led to the structure of MUC-6} time and place, the effect, etc. For MUC-2, the The Message Understanding Conferences were template had 10 slots. Both MUC-1 and MUC- initiated by NOSC to assess and to foster research 2 involved sanitized forms of military messages on the a u t o m a t e d analysis of military messages about naval sightings and engagements. containing textual information. Although called The second MUC also worked out the details of "conferences", the distinguishing characteristic of the primary evaluation measures, recall and pre- the MUCs are not the conferences themselves, cision. To present it in simplest terms, suppose but the evaluations to which participants must the answer key has Nke~ filled slots; and that a submit in order to be permitted to attend the system fills Neor,.~t slots correctly and Nin~or,,~t conference. For each MUC, participating groups incorrectly (with some other slots possibly left un- have been given sample messages and instructions filled). Then on the type of information to be extracted, and have developed a system to process such messages. Ncorrect recall - Then, shortly before the conference, participants Nkey are given a set of test messages to be run through their system (without making any changes to the 2There were, however, a number of individual re- system); the output of each participant's system scm'eh efforts in information extraction underway be- [bre the first MUC, including the work on information 1The full proceedings of the conference are to be formatting of medieM narrative by Sager at New York distributed by Morgan Kaufmann Publishers, San Ma- University; the formatting of naval equipment failure teo, California; earlier MUC proeeedings~ for MUC-3, reports by Marsh at the Naval Research Laboratory; 4, and 5, are also available from Morgan Kaufmann. and the DBG work by Logieon for RADC. zycnzj.com/http://www.zycnzj.com/ 466
  • 2. zycnzj.com/ www.zycnzj.com precision = Nco,,;,.ect Each of these can been seen in part as a reaction Ncorrect + Nincorrect to the trends in the prior MUCs. The MUC-5 tasks, in particular, had been quite complex and For MUC-3 (1991), tile task shifted to reports a great effort had been invested by the government of terrorist events ill Central and South Amer- in preparing the training and test d a t a and by the ica, as reported in articles provided by the For- participants in adapting their systems for these eign Broadcast Information Service, and the tem- tasks. Most participants worked on the tasks for plate becmne somewhat more complex (18 slots). 6 months; a few (the Tipster contractors) had This same task was used for MUC-4 (1992), with a been at work on the tasks tbr consi(lerably longer. further small increase in template complexity (24 While the performance of solne systems was quite slots). impressive (the best got 57% recall, 64% precision MUC-5 (1993), which was conducted as part of overall, with 73% recall and 74% t)recision on the the Tipster program, a represented a substantial 4 "(:or(;" template types), tile question naturally fllrther j u m p in task complexity. Two tasks were arose as to whether there were m a n y apl)lieations involved, international joint ventures and elec- tbr which art investment of one or several develop- tronic circuit fabrication, in two hmgnages, En- ers over half->year (or more) could be justified. glish and Japanese. The joint venture task re- Furthermore, while so much effort had been ex- quired 11 templates with a total of 47 slots for pended, a large portion was specific to tire partic- the output double tile number of slots defined ular tasks. It wasn't clear whether much progress for M U C - 4 and the task documentation was was being made on the underlying technologies over 40 pages long. which would be needed for hetter understanding. One innovation of MUC-5 was the use of a To address these goals, the meeting formulated nested template structure. In earlier MUCs, each an ambitious menu of tasks for MUC-6, with the event had been represented as a single temi)late idea that individual participants could choose a • in effect, a single record in a data l)ase, with a subset of these tasks. We consider the three goals large nuinber of attributes. This format proved in the three sections below, and describe the tasks awkward when an event had several participmlts which were developed to address each goal. (e.g., several victims of a terrorist attack) and one wanted to record a set of facts about each partic- ipant. This sort of information (:ould be ranch 4 Short-term subtasks more easily recorded in the hierarchical structure The first goal was to identit~y, from the compo- introduced for MUC-5, in which there was a single nent technologies being developed for information template for an event, which pointed to a list of extraction, flmctions which would be of 1)ractical temI(lates, one for each particii)mlt in tile event;. 4 use, would be largely domain indet)endent, and couhl in the near term be performed automatically 3 MUC-6: initial goals with high ac('uracy. To meet this goal the con> mittce developed the "named entity" task, which 1)ARI)A convened a meeting of Tipster partici- t(asically involves identifying the names of all the pants and government representatives in Decca> people, organizations, and geographic locations in bet' 1993 to define goals and tasks tot M U C - 6 ) a text. Among the goals which were identified were The final task specification, which also involved • demonstrating taskqndependent component time, currency, and percentage expressions, used technologies of information extraction which SGML m a r k u p to identify the names in a text. would be immediately useflfl Figure 1 shows a sample sentence with named en- tity annotations. The tag ENAMEX ("entity name • encouraging work to make information ex- expression") is used for both people and organiza- tractioil systems in<)re portable tion names; the tag NUNEX ( " n u m e r i c expression") • encouraging work on "deeper understanding" is used for currency and I)ercentages. aTipster is a U.S. G o v e r m n e n t program of research 5 Portability and development in the areas of inibrmation retrieval and information extraction. The second goal was 1;o focus on portability in 4In fact the MUC-5 structure wa~smuch (nor(; com- the inibrmation extraction task the ability to plex, because there were separate temt)lates for prod- rapidly retarget a system to extract; information ucts, time, activities of organizations, etc. about a different class of events. The comnfit- '~The representatives of the resear(:h community tee felt that it was important to demonstrate that were Jim Cowie, lS(.alph Grishman (commit;tee chair), Jerry Hobbs, Paul Jacobs, Lea Schubert, Carl Weir, useful extraction systems eouht be created in a and Ralph Weischedel. The government people at- few weeks. To meet this goal, we decided that tending wcre George Doddington, Donna Harman, the infbrmation extraction task for MUC-6 wouhl Boyan Onyshkevych, John Prangc, Bill Schultheis, have to involve a relatively simple template, more and Beth Sundheim. like MUC-2 than MUC-5; this was duhbed "mini- zycnzj.com/http://www.zycnzj.com/ 467
  • 3. zycnzj.com/ www.zycnzj.com Mr. <ENAMEX TYPE="PERSON">Dooner</ENAMEX> met with <ENAMEX:TYPE="PERSON">Martin Puris</ENAMEX>, president and chief executive officer of < E N A M E X TYPE="ORGANIZATION">Ammirati & Puris</ENAMEX>, about <ENAMEX TYPE="ORGANIZATION">McCann</ENAMEX>~s acquiring the agency with billings of <NUMEX TYPE="MONEY">$400 million</NUMEX>, but nothing has materialized. Figure 1: Sample named entit;y annotation. MUC". In keeping with |;he hierarchical tem- on a description of a particular (:lass of events plate structure introduced in MUC-5, it was envi- (a "scenario") was called the "scenario template" sioned |;hat the inini-MUC would have an event- task. A sample scenario template is shown in the level template pointing to templates representing appendix. |;he partieitmnts in the event (people, orgmfiza- tions, products, e.tc.), me(liated perhaps by a "re- 6 Measures of deep understanding lational" level template. To further increase portability, a proposal was Another concern which was noted about the made to standardize the lowest-level tenlplates MUCs is that tile systenls we.re tending to- (for peoph',, orgaIfizations, etc.), since these basic wards relatively shallow understanding techlfiques (:lasses are involved in a wide variety of actions. In (based IIrimarily on local pa.ttern inatching), and this way, MUC participants could develop code for that not enough work was being done to build these low-level telnplates once, and then use them up the mechanisms needed for deeper understand- with m a n y different types of events. These low- ing. Therefore, tile committee, with strong en- level t;emptates were named "telnplate elements". couragement front I)AII.PA, included three MUC As the specification finally deveh)ped, tit(; rein- tasks which were intended to measure, aspex:ts of plate element for orgalfizations had six slots, for the internal processing of an inforlnation extra(:- the inaximal organization nalne, any aliases, the lion or hmguage understanding systenL These type, a descriptive noun phrase, the locale (inost three tasks, which were collectively called Se- specific location), and country. Slots are tilh,d mEwfl ("Senmntic Ewfluation") were: only if inforlnation is explicitly given in the text • C o r e f e r e n c e : the systent would have to (or, ill the ease of the country, can be inDrred mark coreferential noun t)hrases (the initial Doln an explicit locale). The text SlmCification envisioned marking set-subsel; We are striving to have a strong re- and part-whole relations, ill addition to iden- newed creative partnership with Coca- tity relations) Cola," Mr. Dooner says. However, o(lds • Word sense disambiguation: for each of that hapt;ening are slim since word ope.n (:lass word (noun, verb, a,djective, ad- from Coke headquarters in Atlanta is verb) in the text, the systein would have to that... determine its sense using the Wordlmt clas- wouht yiehl an organization telnplate elenmnt s|Ileal|on (its "synset", in Wordnet termii~of with live of these six slots filled: ogy) <0RGANIZATION-9402240133-5> := • Predicate-argument s t r u c t u r e : the sys- ORG NAME: "Coca-Cola" tem wouhl have to create a tree interrelating ORG ALIAS: "Coke" the constituellts of the sentence, using sonm ORG TYPE: COMPANY set of gralnma.tical flnmtional relations ORG_LOCALE: Atlanta CITY The committee recognized that, in seh;eting sneh ORG COUNTRY: United States internal measures, it, was inaking sortie presumI) (the first line identities this as organization tenl- tion regarding the structures and decisions which plate 5 from article 9402240]33). an analyzer should make in understanding a doc- Ever on the lookout for additional ewfluation llmellt. Not everyone would share these pre, sump- measm'es, the committee decide, d to nlake the cre- lions, lint participants in the next MU(J would ation of telnI)late eh,ments tbr all the people and be free 1;o enter the infornlation extraction evalu- organizations in a text a separate MUC task. lake ation and skip some or all of these internal ewdua- the named entil;y task, this was also seen as a Lions. Language understanding technology might potential demonstration of the ability of systelns develop in ways very diIii?rent from those imagined 1;o pertbrm a useflfl, relatively dolnain indepen- by the committee, and these internal evaluations dent task with near-term extraction te(:hnoh)gy might turn ollt t() t)e irrelevant distractions. How- (although it was recognized as being more dilli- ever, froln the current perslmctive of tnost of the cult than named entity, since it required merging eolnmittec, @ese seenmd fairly ])asic aspects of information from several places in the text). The unde, rstanding, and so an experinmnt in evahlat- old-style MUC information extraction task, based ing them (and encouraging improvem(mt in them) zycnzj.com/http://www.zycnzj.com/ 4:68
  • 4. zycnzj.com/ www.zycnzj.com w o u l d I)e worl;hwhil(~. Round 2: a n n o t a t i o n T h e n e x t st;(; 1) was the IWel)axal;hm of a substa.nt;ia.1 7 Preparation process Lra.illing corpllS for LII(~ l;wo novel t,asks which re- n m i n e d ( n m u e d Clll;il;y &lid COI'(~,f(~,FO,1IC(Q. S R A C o l Round 1: R e s o l u t i o n of SemEval l)orat:ion k i n d l y p r o v i d e d tools which a i d e d in t;he T h e c o m m i t t e e , h a d l)ropos(;(t a ve.ry anll)itious a n n o l ; a t i o n p r o c e s s . A g a i n a sl;alwa.rt grtml) of vt)l- I ) r o g r m n of cvahu~l;ions. Wc now h a d to r(xhlce. uui;e(w a.nn()i;alx)rs was assenfl)led; 7 e a c h was 1)to - t h e s e I)roi)osals to (let;ailed spe.cifi(:ations. The. vide(l w i t h 25 m't;i(:lcs f r o m 1;h('. W a l l S t r e e t .]our- first s t e p was t;o (lo s o m e ma,mlal te.xl; anuol:a.-- na.l. There. was SOlUe o v e r l a p b(!Lween t;hc arLi(:les l i o n for t h e f o r e ~,asks n a m e d em;ity mM the a s s i g n e d , s() t,haL we c o u l d IIIO&Slll'(! ~;}1(~.c o n s i s t e n c y Selnt,;val t r i a d whi(:h were quit(: (tifii!r(!nt from of a.mloi;m;ion /w.|:weeu silx~s. T h i s a m l o i , a t i o n w~s w h a t h a d be(m l;rie(l before, lh'M! sp(~(:ifi(:ations (lone. in I.he w i n t e r o[ 1994-95. were p r e p a r e d for ca(:h t a s k , a n d in the sl)ring of A m a j o r role o[ the. mmol;aLion l)ro(:e.ss was Lo ] 994 a g r o u I) of vohmt(~ers (most;ly vel;(n:ans ()f ear-. i(lemify and res()lv(~ l)r(fl)h!ms wil;h l;he t a s k Sl)(X> lier MU(Js) annol:~mxl a shorl: newst)~p(w m'tM(', ifi(:a.tions. For na.nied cnl;iifies, this was rel~tl;ively using ('.ach set of specifi(:ations. st, rtdght, forwar([. For COI'(~[(~I'(;I/(',(;, it p r o v e d r(',- P r o t ) l e m s a r o s e w i t h ea.(:h of t;he S e m E v a l tasks. m a r k a t ) l y (lifli(:ult to f()rmutat;e guitl(,lines which were r e a s o n a l ) l y comI/lel;(~ a,nd <:onsist, ent.. s * F o r corefcren('e., ther(', were p r o b l e m s i(hull;i[y- ing i)art-whoh~ a n d sei;-sul)s(¢ rela.tions, mM RomM 3: d r y rml d i s t i n g u i s h i n g the, two; a decision wa.s lm;er ( ) n e e the t;ask sl)e(:ifica.l;ions s e e m e d r(~asonably m a d e to l i m i t ourselv(;s I;() i(lenLi(;y rela.I;ions. stM)l(b N l b d ) ()rg;ufiz(~(l a "(lry run" a full-s(:al(~ ® b b r sens(' I:~gging, l;h(; ~l.llllOl;tl, t,()l'S forum that, r(~hearsal for M U C - 6 , I)ul; w i t h all result:s r('4)ori;ed in s o m e cases W o r d n ( , t m a d e v e r y [ine dis- a.nouymously. T h e d r y r u n Ix)ok t)l;u:e in A p r i l 1995, wil;h a s(:enario i u v o l v i n g l a b o r union (:()n. 1;incl;ions a,nd thai; m a k i n g l;hese (list,incti(ms l,ra,c.t; n(~gotia.l:i(ms. ()f 0 m sil;es whi(:h we.re in- c o n s i s t e n t l y ill l;agging was v e r y ditticulI;. v o l v e d in t;he a n n o t ; a t i o n l/r()(:('~s,q, t;en 1)arl:i(:ipatx~(l e F o r p r e d i c a t e - a r g u m e a l t sl;ru(;l;llr(',, pracl;ically in l h e d r y run. R e s u l t s of t;h(~ d r y r u n were r(> e v e r y new CoIIS[;Ill(;l; 1)(;y()lI(l s i m p l e clauses l)()rWxl n.I, l;he Tit)sl:er I)hase II 12--mout;h m e ( M u g and n o u n l)hrases r;tise(l n e w issues which had in M a y 1995. I;o t)e toilet:lively r(:solve(l. 8 The formal evaluation Beyon(l th(;se in(lividuM t)rolflenls, il; was fell: l;hal; l;he m e n u was s i m p l y (;oo anfl)il, ious, m M l;hal; The MUC6 f o r m a l ewflu;ttion was /mhl in w('. w o u l d do t)('.l:t('x by (:on(:entrat, ing on out; (',le- ,~{(q)l:emt)ex 1995. T h e s(:(;nario (l(~finil;ion w;L,q dis- menl: of the Sem(;v;fl l,riad for M U C - 6 ; a t a. me('.l;- t,ribuIxxt at, t;he t)egimling ()[' S ( q ) t e m b e r I l;he test ing hehl in .hllm 1994, a d e c i s i o n was mad(; to d a t a was disiaibut, cd four we('.ks late.r, w i t h re go w i t h coref('xea,(:('.. In i/arl;, this r(~tl(w.l;est a feel- sult;s d u e by (,he end ()[' th(; w('.ek. T h e ,qcena.rio ing t h a t the t)rol)lems wi@ Lh(', (:()refl',ren(:(~ Sl)(X> involv(M (;h;l,II~O,S ill COI'|)OF;I,I;(~ (LK(~CIII;iv(; II],%II;/,~C- ili(:a.I;ion w('.re l:he mosl; m n ( m a b l e l:o soluli(m, lilt, m('.n(; p('a,~onn(~l. The. (;valua.1;i(m reel; m m l y ()I 1;t1('. also re.fle(:i;ed a. (:onvicl;ion I;hal; (:or(ff('r(m(:(~ idea> f~oals which h a d /)(~en set, 1)y th('~ iniLial p l a n n i n g t:ilication h a d 1)een, &nd w o u l d re, m a i n , c r i t i c a l (:onfer(mt:e. in l)e(:emlmr ()f 1993. 1;o success ill inforina.t;iou cxl;r~mi;ion, au(1 st) it T h e r e were (;va]u;Lti(ms for [our t,asks: 1HIIII(RI wgts [IIlpor[;~l,ll[; 1;o (?llC()llrtl,~(~ a, dvtl~ltc(;s in (:or(',[ k entit;y, (:orel'('.re.n(:e, 1;eml)lat(, c,lt!inenI;, }l,ll(t s(;c-- m(;nc(;, tin contrasl;, mosl; (;xt, rat:l;ion sysl;ems n m i o I;e, mt~lm;(u T t m r e w('r(; 16 t)m'ti(;ipmfl;s; 11.5 did nol; buil(t fltll t)redi(:ate-atrgument sl;ru(:l;ures, 1)arti(:it)al;e(l in the nmne(l ent, it,y t a s k , 7 in (',oref- a n d w o r d - s e n s e (lismnbigual;ion p l a y e d a relal;iv('ly (~l'Oll(~(~,, ] 1 ill t(',ml)lat;(; elemenl;, an(l 9 in s(:enari() , stnall role ill exl;ra(:l;ion ( p a r t i c u l a r l y since (;xl;l';t(> l,(;mi)lal;(,,. l;ion sysl;ems o l ) e r a t e d in a n a r r o w d o m a i n ) . Name(l e n i ; i t y was inl;(mdcd to b(; a siml)h~ 'Phe (:or('~h'a'(;n(:('~ t a s k , like. t h e n a m ( x l entil;y t a s k on whi(:h syst, e m s coul(t (lernoustrat, e a high l;ask, was a.nnotal;ed u s i n g S G M I , n()tal;i()tl. A level of 1)(!rforumn(:e ... high e n o u g h for imme(li- C{]REF t a g has mt ID ai;l;ri|)ul;(' whi(:h i(lenl;ifies l;he m;e use. O u r su(:(;(;ss iu I;his t,a s k (~x(:(;(~(le(l our t a g g e d noult 1)hrase or l)ron(mn, ll; tn;ty also ha.vc a.n at,l;ril)ut;(' of t h e [orm REF--n, w h i c h indi(:al,es >l'he annol;;)A;ion groups were from BBN, Brall(t(fis Univ., t~he Univ. of Durham, Lo(:kheed-Marl;in, New thai; this lfln'ase is (:or(,fe,r(mtiM w i t h I;he 1)hrasc Mexico Sl;ai;e Univ., N l b d ) , New York Univ., PRC, wit;h I1) n. Figure, 2 shows an (;x(:('rt)I; fl'om ; m l;he, Univ. of l)(mnsylwmia, SAIC (San /)iego), SRA, m ' t M e , ann(/l;al;c(t [or (;orefereal(;e. (; SR[, the Univ. of Shefliehl, SouLhe, rn Metlmdisl; Univ., mr(1 Ultisys. 6'The TYPE mM M]~N;tl;l;I'il)uLes which a p p e a r in l,he SAs exl)e,rienced (:Oml)ut~tional linguists, we 1)rol)- ;tctmd annot;al;ion have been o m i t t e d here fin the s~tke ably should ha,re kuown 1)el;l;(',r l;han to l;hink this wa.s of readM)ilil;y. an easy t~ask. zycnzj.com/http://www.zycnzj.com/ 469
  • 5. zycnzj.com/ www.zycnzj.com Maybe <COREF ID="136" REF="I34">he</CSREF>'II even leave something from <COREF ID="138" REF="I39"><COREF ID="137" REF="I36">his</COREF> office</COREF> for <CSREF ID="I40" REF="91">Mr. Dooner</COREF>. Perhaps <COREF ID="144">a framed page from the New York Times, dated Dec. 8, 1987, showing a year-end chart of the stock market crash earlier that year</COREF>. <COREF ID="I41" REF="I37">Mr. James</COREF> says <COREF ID="142" REF="I41">he</COREF> framed <COREF ID="143" REF="I44" STATUS="OPT">it</COREF> and kept <COREF ID="145" REF="I44">it</COREF> by <COREF ID="146" REF="I42">his</COREF> desk as a "personal reminder. It can all be gone like that." Figure 2: Sample coreference annotation. expectations. The majority of sites had recall overall was 47% recall and 70% precision. and precision over 90%; the highest-scoring sys- One can observe an increasing convergence of tem had a recall of 96% and a precision of 97%. methods tbr information extraction. Most of Although one must keep in mind the somewhat the systems participating in MUC-6 employed a limited range of texts in the test set (all are from cascade of finite-state pattern recognizers, with the Wall Street Journal, in particular), the re- the earlier pattern sets recognizing entities, and sults are excellent. A couple of these systems have the later sets recognizing scenario-specific pat- been commercialized, and several are being incor- terns. This convergence may be one reason for porated into government text-processing systems. tile bunching of scores for this task -- most sys- Given this level of performance, there is probably tems fell in a rather narrow range in both recall little point in repeating this task with the same and precision. ground rules in a future MUC (although there The results of this MUC provide valuable pos- might be interest in processing monoease text and itive testimony on behalf of information extra(> in performing comparable tasks oil a more varied tion, but further improvement in both portability corpus and for languages other than English). and performance is needed tbr m a n y applications. The t e m p l a t e e l e m e n t task, while superfi- With respect to port~bility, custoiners would like cially similar to named entities - ~ it is also based to have systems which can be ported in a t'ew on identifying people and organizations ~ is sig- hours, or at most a few days, by someone with nificantly more difficult. One has to identify de- less expertise than a system developer. How this scriptions of entities ("a distributor of kumquats") might be tested in the context of a MUC is not en- as well as names. If an entity is mentioned sev- tirely clear. For one thing, most sites spent several eral times, possibly using descriptions or differ- days just studying the scenario description and ent forms of a name, these need to be identified annotated corpus, in order to understand tile sce- together; there should be only one template ele- nario definition, before coding began. Perhaps a ment for each entity in an article. Consequently, micro-MUC 9 with an even simpler template struc- the scores were appreciably lower, ranging across ture, is needed to push the limits of port, ability. most systems from 65 to 75% in recall, and from Getting systems which can be custonfized by oth- 75% to 85% in precision. The top-scoring sys- ers is also a tall order, given the complexity and tem had 75% recall, 86% precision. Systems did variety of knowledge sources needed for a typical particularly poorly in identifying descriptions; the MUC information extraction task. highest-scoring system had 38% recall and 51% With respect to performance, tile bunching of precision for descriptions. scores suggests that m a n y sites were able to solve a There seemed general agreement that having common set of "easy" problems, but were stymied prepared code for template elements in advance in processing messages which involved "hard" did make it easier to port a system to a new see- problems. Whether this is true, and just what nario in a few weeks. This factor, and the room the hard problems are, will require more extensive that exists for improvement in performance, sug- analysis of the results of MUC-6. Are the short- gest that including this task in a future MUC may comings due primarily to a lack of coverage in the be worthwhile. basic patterns, to a lack of background knowledge The goal for s c e n a r i o t e m p l a t e s mini- in the domain, to failures in coreference, or some- MUC - - was to demonstrate that effective infor- thing else? We. may hope that the failings are mation extraction systems could be created in a primarily in one area, so that we may concentrate few weeks. This too was successful. Although it is our energies there, but more likely the failings will difficult to meaningfully compare results on differ- be in m a n y areas, and broad improvements in ex- ent scenarios, the scores obtained by most systems traction engines will be needed to improve perfor- after a few weeks (40% to 50% recall, 60% to 70% mance. precision) were comparable to the best scores ob- tained in prior MUCs. The highest performance 9a term suggested by George Krupka zycnzj.com/http://www.zycnzj.com/ 470
  • 6. zycnzj.com/ www.zycnzj.com Pushing improvements in the underlying tech- refl~rences to the O R G A N I Z A T I O N template for nology was one of tlm goals of SemEval and its the organization involved, and the IN_AND OUT current survivor, eoreference.. Much of tile en- template for the activity involving that post (if ergy for the current round, however, went into an article describes a person leaving and a per- honing the definition of the task. Philosol)hers son start;ing the same job, there will be two of language have been arguing over reference and IN_AND_OUT templates). The IN_AND_OUT coreferencc for centuries, so we should not have template contains references to the tmnt)lates fl)r been surprised that it would t)e so hard to pre- the P E R S O N and tbr the ORGANIZATI()N from pare a precise and consistent definition. Addi- which the person came (if he/she is starting a tional work on the definition will he necessary, new job). The P E R S O N and O R G A N I Z A T I O N and it may be necessary to narrow the task fllr- templates are the "temt)late element" templates, ther. Despite these distractions, a few interesting which are invariant across scenarios. early results were ol)tained regarding eoreference methods; we may hot)e that, once the task specifi- cation settles down, the availability of coreference- aimotated corpora and the chance for glory ill fltr- ther evaluations will ein'ourage more work in this area. Appendix: Sample Scenario Template Shown below is a set of templates for the MUC- 6 scenario template task. Tile scenario involved changes in corporate executive management per- sonnel. ~br the text; McCann has initiated a new so-called global collaborative system, (:omposed of world-wide account directors paired with creative partners. In addition, P(> ter Kim was hired from W P P Grout)'s .I. Walter T h o m p s o n last; Septenfl)er as vice chairman, chief strategy officer, worhl- wide. the following templates were to be generated: <SUCCESSION_EVENT-9402240133-3> := SUCCESSION_ORG : <ORGANIZATION-9402240133-1> POST: " v i c e chairman, c h i e f s t r a t e g y officer, world-wide" I N _ A N D _ O U T : < I N _ A N D _ O U T - 9 4 0 2 2 4 0 i33~5> VACANCY_REASON : OTH_UNK < IN_AND_OUT-9402240133-5> := IO_PERSON : <PERSON-9402240133-5> N E W _ S T A T U S : IN O N _ T H E _ J O B : YES OTHER_ORG : <ORGANIZATION-9402240133-8> R E L _ O T H E R ORG : O U T S I D E _ O R G < O R G A N I Z A T I O N - 9 4 0 2 2 4 0 1 3 3 - i> := O R G _ N A M E : "McCann" ORG_TYPE : COMPANY <ORGANIZATION-9402240133-8> := ORG_NAME: "J. W a l t e r T h o m p s o n " ORG_TYPE : COMPANY <PERSON-9402240133-5> := P E R NAME: "Peter Kim" Although we cannot explain al] tile details of the template here, a few highlights shouht be noted. For each executive post; one generates a S U C C E S S I O N _ E V E N T template, which contains zycnzj.com/http://www.zycnzj.com/ 471